1 Introduction and main results

With his work on the statistical analysis of large samples [69], Wishart initiated the systematic study of large random matrices. Ever since, random matrices have continuously entered more and more areas of mathematics and applied sciences beyond probability theory and statistics, for instance, in numerical analysis through the work of Goldstine and von Neumann [20, 65] and in quantum physics through the works of Wigner [66,67,68] on his famous semicircle law, which resulted in significant effort to understand spectral statistics of random matrices from an asymptotic point of view. Today, random matrix theory has grown into a vital area of probability theory and statistics, and within the last two decades, random matrices have come to play a major role in many areas of (algorithmic) computational mathematics, for instance, in questions related to sparsification methods [1, 54] and sparse approximation [57, 58], dimension reduction [4, 12, 44], or combinatorial optimization [46, 53]. We refer the reader to [5, 6, 60] for more information.

In this paper, we are interested in the non-asymptotic theory of (large) random matrices. This theory plays a fundamental role in geometric functional analysis at least since the ’70 s, the connection coming in various different flavors. It is of particular importance in the geometry of Banach spaces and the theory of operator algebras [9, 10, 15, 18, 21, 30] and their applications to high-dimensional problems, for instance, in convex geometry [17, 22], compressed sensing [14, 16, 48, 63], information-based complexity [27, 28], or statistical learning theory [50, 64]. On the other hand, geometric functional analysis had and still has enduring influence on random matrix theory as is witnessed, for instance, through applications of measure concentration techniques; we refer to [15, 42] and the references cited therein. The quantity we study and focus on here concerns the expected operator norm of random matrices considered as operators between finite-dimensional \(\ell _p\) spaces; recall that \(\ell _p^n\) denotes the space \(\mathbb {R}^n\) equipped with the (quasi-)norm \(\Vert \cdot \Vert _p\), given by \(\Vert (x_j)_{j=1}^n\Vert _p = (\sum _{j=1}^n |x_j|^p)^{1/p}\) for \(0<p<\infty \) and \(\Vert (x_j)_{j=1}^n\Vert _\infty = \max _{j\le n}|x_j|\) if \(p=\infty \). We address the following problem: for \(1 \le p,q \le \infty \) and \(m,n\in \mathbb {N}\), determine the right order (up to constants that may depend on the parameters p and q) of

$$\begin{aligned} \mathbb {E}\Vert X_A:\ell ^n_p \rightarrow \ell ^m_q\Vert , \end{aligned}$$

where, given a deterministic real \(m\times n\) matrix \(A = (a_{ij})_{i\le m, j\le n}\) and a random matrix \(X = (X_{ij})_{i\le m, j\le n}\), we denote by

$$\begin{aligned} X_A {:=}A\mathbin {\circ }X = (a_{ij} X_{ij})_{i\le m, j\le n} \end{aligned}$$

the structured random matrix; the symbol \(\mathbin {\circ }\) stands for the Hadamard product of matrices (i.e., entrywise multiplication). The bounds on the expected operator norm should be of optimal order and expressed in terms of the coefficients \(a_{ij}\), \(i\le m,j\le n\). Understanding such expressions and related quantities is important, for instance, when studying the worst-case error of optimal algorithms which are based on random information in function approximation problems [28] (see also [33]) or the quality of random information for the recovery of vectors from an \(\ell _p\)-ellipsoid, where (the radius of) optimal information is given by Gelfand numbers of a diagonal operator [29].

In the case where the random entries of X are i.i.d. standard Gaussians (then we write \(G_A\) instead of \(X_A\)) and \(1\le p,q \le \infty \), we will show the following bound, which is sharp up to logarithmic terms:

$$\begin{aligned} D_1 + D_2 \lesssim \mathbb {E}\Vert G_A :\ell ^n_p\rightarrow \ell ^m_q\Vert \lesssim (\ln n)^{1/p^*} (\ln m)^{1/q} \bigl [ \sqrt{\ln (mn)} D_1 +\sqrt{\ln n} D_2\bigr ],\nonumber \\ \end{aligned}$$
(1.1)

where \(D_1 {:=}\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\), \(D_2 {:=}\Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}\), and \(p^*\) denotes the Hölder conjugate of p defined by the relation \(1/p+1/p^*=1\). As will be explained later, we obtain sharp estimates in certain cases and derive results similar to (1.1) for other models of randomness.

1.1 History of the problem and known results

In what follows, \(A = (a_{ij})_{i,j}\) is a real deterministic matrix and \(G=(g_{ij})_{i,j}\) always stands for a random matrix with i.i.d. standard Gaussian entries (usually the matrices are of size \(m\times n\) unless explicitly stated otherwise). We use C(r), C(rK), etc. for positive constants which may depend only on the parameters given in brackets and write \(C, C', c,c',\dots \) for positive absolute constants. The symbols \(\lesssim \), \(\lesssim _{r}\), \(\lesssim _{r, K}\), etc. denote that the inequality holds up to multiplicative constants depending only on the parameters given in the subscripts; we write \(a\asymp b\) if \(a\lesssim b\) and \(b\lesssim a\), and \(\asymp _r\), \(\asymp _{r,K}\), etc. if the constants may depend on the parameters given in the subscript.

In 1975, Bennett, Goodman, and Newman [9] proved that if X is an \(m\times n\) random matrix with independent, mean-zero entries taking values in \([-1,1]\), and \(2\le q < \infty \), then

$$\begin{aligned} \mathbb {E}\Vert X:\ell ^n_2 \rightarrow \ell ^m_q\Vert \lesssim _q \max \{n^{1/2}, m^{1/q}\}. \end{aligned}$$
(1.2)

In fact, up to constants, this estimate is best possible: for any \(m\times n\) matrix \(X'\) with \(\pm 1\) entries one readily sees that \( \Vert X':\ell ^n_2 \rightarrow \ell ^m_q\Vert \ge \max \{n^{1/2}, m^{1/q}\}\); just use standard unit vectors and operator duality. Moreover, in this ‘unstructured’ case, where \(a_{ij}=1\) for all ij, it is easy to extend (1.2) to the whole range of \(p, q\in [1,\infty ]\) (see [8, 13] or Remark 4.2 below). Also, if all entries are i.i.d. Rademacher random variables, then the bounds are two-sided, i.e., the expected operator norm is, up to constants, the same as the minimal norm for all p, q (see [8, Proposition 3.2] or [13, Satz 2]).

The case most studied in the literature is the one of the spectral norm, i.e., the \(\ell _2^n \rightarrow \ell _2^m\) operator norm. Seginer [51] proved in 2000 that if \(X = (X_{ij})_{i\le m, j\le n}\) is an \(m\times n\) random matrix with i.i.d. mean-zero entries, then its operator norm is of the same order as the sum of expectations of the maximum Euclidean norm of rows and columns of X, i.e.,

$$\begin{aligned} \mathbb {E}\Vert X:\ell _{2}^n\rightarrow \ell _2^m\Vert&\asymp \mathbb {E}\max _{j \le n} \Vert (X_{ij})_{i=1}^m\Vert _2 + \mathbb {E}\max _{i \le m}\Vert (X_{ij})_{j=1}^n\Vert _{2}. \end{aligned}$$
(1.3)

Riemer and Schütt [49] proved that, up to a logarithmic factor \(\ln (en)^2\), the same holds true for any random matrix with independent but not necessarily identically distributed mean-zero entries. Let us also mention that in the Gaussian setting one can use a non-commutative Khintchine bound (see, e.g., [59, Equation (4.9)]) to show that, up to a factor \(\sqrt{\ln n}\), the expected spectral norm is of the order of the largest Euclidean norm of its rows and columns.

In the very same setting that was considered by Riemer and Schütt, Latała [37] had obtained a few years earlier the dimension-free estimate

$$\begin{aligned} \mathbb {E}\Vert X :\ell _{2}^n\rightarrow \ell _2^m\Vert \lesssim \max _{j\le n} \Bigl (\sum _{i=1}^m \mathbb {E}X_{ij}^2\Bigr )^{1/2} + \max _{i\le m} \Bigl (\sum _{j=1}^n \mathbb {E}X_{ij}^2\Bigr )^{1/2} + \Bigl (\sum _{i=1}^m \sum _{j=1}^n \mathbb {E}X_{ij}^4\Bigr )^{1/4}. \end{aligned}$$

This bound is superior to the Riemer–Schütt bound in the case of matrices with all entries equal to 1 and is optimal for Wigner matrices. In other cases, like the one of diagonal matrices, the Riemer–Schütt bound is better.

In the case of structured Gaussian matrices, Latała, van Handel, and Youssef [40], building upon earlier work of Bandeira and van Handel [7] (which combined the moment method with combinatorial considerations) as well as results proved by van Handel in [61] (which used Slepian’s lemma), obtained the precise behavior without any logarithmic terms in the dimension, namely

$$\begin{aligned} \displaystyle \mathbb {E}\Vert G_A :\ell _{2}^n\rightarrow \ell _2^m\Vert&\asymp&\mathbb {E}\max _{j \le n} \Vert (a_{ij} g_{ij})_{i=1}^m\Vert _2 + \mathbb {E}\max _{i \le m}\Vert (a_{ij} g_{ij})_{j=1}^n\Vert _{2} \nonumber \\&\asymp&\max _{j \le n} \Vert (a_{ij})_{i=1}^m\Vert _2 + \max _{i \le m}\Vert (a_{ij})_{j=1}^n\Vert _{2} + \mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}| .\nonumber \\ \end{aligned}$$
(1.4)

Their proof is based on a clever block decomposition of the underlying matrix (see [40, Fig. 3.1]). This result finally answered in the affirmative a conjecture made by Latała more than a decade before. We also refer the reader to the survey [62] discussing in quite some detail results prior to [40] and [61]—the latter work discusses the conjectures of Latała and van Handel and shows their equivalence.

Very recently, Latała and Świątkowski [39] investigated a similar problem when the underlying random matrix has Rademacher entries. They proved a lower bound which, up to a \(\ln \ln n\) factor, can be reversed for randomized \(n\times n\) circulant matrices.

In [23], Guédon, Hinrichs, Litvak, and Prochno studied our main and motivating question on the order of the expected operator norm of structured random matrices considered as operators between \(\ell _p^n\) and \(\ell _q^m\) in the special case where \( p\le 2\le q \) and the random entries are Gaussian. In this situation, where we are not dealing with the spectral norm, the moment method cannot be employed. The approach in [23] was therefore different and based on a majorizing measure construction combining the works [24] and [25]. In [23, Theorem 1.1], the authors proved that if \(1< p\le 2\le q < \infty \), then

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert\lesssim & {} \gamma _q \max _{j \le n}\Vert (a_{ij})_{i=1}^m\Vert _q + (p^*)^{5/q} (\ln m)^{1/q} \gamma _{p^*}\max _{i \le m}\Vert (a_{ij})_{j=1}^n\Vert _{p^*} \nonumber \\{} & {} + (p^*)^{5/q} (\ln m)^{1/q} \gamma _q \ \mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}|, \end{aligned}$$
(1.5)

where \(\gamma _r {:=}(\mathbb {E}|g|^r)^{1/r}\) for a standard Gaussian random variable g. Moreover, for \(p = 1\) and \(q \ge 2\), it was noted in [23, Remark 1.4] (see also [45, Twierdzenie 2]) that

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{1}^n\rightarrow \ell _q^m\Vert&\lesssim \sqrt{q} \max _{j\le n} \Vert (a_{ij})_{i=1}^m\Vert _q + \mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}|. \end{aligned}$$
(1.6)

Later, an extension of (1.5) to the case of matrices with i.i.d. isotropic log-concave rows was obtained by Strzelecka in [55].

Trying to extend the upper bound for \(\mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _q^m\Vert \) to the whole range \(1\le p, q\le \infty \) one encounters two difficulties. First of all, the methods used in order to prove (1.5) fail if \(q\le 2\) or \(p \ge 2\), because the majorizing measure construction used in [23] is restricted to the case \(q\ge 2\) and the assumption \(1<p\le 2\) is required in a Hölder bound. Moreover, when \(q\le 2\) or \(p \ge 2\) the result cannot hold with the right-hand side of the same form as in (1.5) (see Remark 4.2 below for counterexamplesFootnote 1 to (1.5) in the cases \(q\le 2\) and \(p \ge 2\)). This explains the different form of expressions \(D_1\) and \(D_2\) in (1.1), which in the range \(p \le 2 \le q\) reduce to the maxima of norms on the right-hand side of (1.5)—see (1.9) below.

1.2 Lower bounds and conjectures

By arguments similar to the ones used in order to prove the lower bound in (1.4), one can check that in the range considered in [23, 45] (i.e., \(1\le p\le 2\le q \le \infty \)) one has

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert& > rsim _{p,q} \max _{j \le n}\Vert (a_{ij})_{i=1}^m\Vert _q + \max _{i \le m}\Vert (a_{ij})_{j=1}^n\Vert _{p^*} \nonumber \\&\quad + \mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}|. \end{aligned}$$
(1.7)

Note that for \(p=1\),

$$\begin{aligned} \max _{i \le m}\Vert (a_{ij})_{j=1}^n\Vert _{p^*} = \max _{i\le m,j\le n} |a_{ij}| \le \sqrt{\pi /2}\,\mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}|, \end{aligned}$$

which explains the simplified form of (1.6).

We remark that the proof of (1.7) is based merely on the observation that the operator norm is greater than the maximum entry of the matrix and the appropriate maximum norms of its rows and columns, combined with comparison of moments for Gaussian random vectors. Another but related way to proceed, valid for all \(1 \le p, q \le \infty \), is to exchange expectation and suprema over the \(\ell _p^n\) and \(\ell _{q^*}^m\) balls in the definition of the operator norm. We present the details in Sect. 5.1. In particular, Proposition  5.1 and Corollary 5.2 implyFootnote 2 that, for \(1\le p,q \le \infty \),

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert > rsim&\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2} + \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \nonumber \\&+ \mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}|. \end{aligned}$$
(1.8)

It is an easy observation (see Lemma 2.1 below) that for \(p\le 2 \le q\),

$$\begin{aligned} \begin{aligned} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}&= \max _{j \le n}\Vert (a_{ij})_{i=1}^m\Vert _q, \\ \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}&= \max _{i \le m}\Vert (a_{ij})_{j=1}^n\Vert _{p^*}. \end{aligned} \end{aligned}$$
(1.9)

Thus, in the range \(1 \le p \le 2 \le q < \infty \) considered in [23, 45], the lower bounds (1.7) and (1.8) coincide.

Although it would be natural to conjecture at this point that the bound (1.8) may be reversed up to a multiplicative constant depending only on pq, such a reverse bound turns out not to be true in the case \(p\le q< 2\) (and in the dual one \(2< p\le q\)) as we shall show in Sect. 5.3.

In order to conjecture the right asymptotic behavior of \(\mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _q^m\Vert \), one may take a look at the boundary values of p and q, i.e., \(p\in \{1,\infty \}\) or \(q\in \{1, \infty \}\). Note that (1.6) provides an asymptotic behavior of \(\mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _q^m\Vert \) on a part of this boundary (i.e., for \(p=1\) and \(2\le q\le \infty \) and in the dual case \(q=\infty \) and \(1\le p\le 2\)). We provide sharp results on the remaining parts of the boundary of \([1,\infty ]\times [1,\infty ]\) (see dense lines on the boundary of Fig. 1 below):

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell ^n_p\rightarrow \ell ^m_1\Vert&\asymp _{p} D_1 +D_2{} & {} \qquad \text {for all }1<p\le \infty , \\ \mathbb {E}\Vert G_A :\ell ^n_\infty \rightarrow \ell ^m_q\Vert&\asymp _{q} D_1 +D_2{} & {} \qquad \text {for all }1\le q<\infty ,\\ \mathbb {E}\Vert G_A :\ell ^n_1\rightarrow \ell ^m_q\Vert&\asymp \, D_1 + \max _{j\le n} (\sqrt{\ln (j+1)} b_j^{\downarrow {}}){} & {} \qquad \text {for all }1\le q\le 2, \\ \mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _\infty ^m\Vert&\asymp \, D_2+ \max _{i\le m} (\sqrt{\ln (i+1)} d_i^{\downarrow {}}){} & {} \qquad \text {for all }2\le p\le \infty , \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} D_1&{:=}\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2},\\ D_2&{:=}\Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}, \end{aligned} \qquad \qquad \begin{aligned} b_j&{:=}\Vert (a_{ij})_{i\le m} \Vert _{2q/(2-q)}, \\ d_i&{:=}\Vert (a_{ij})_{j\le n} \Vert _{2p/(p-2)}, \end{aligned} \end{aligned}$$

and with \((x_1^{\downarrow {}}, \ldots ,x_n^{\downarrow {}})\) denoting the non-increasing rearrangement of \((|x_1|,\ldots , |x_n|)\) for a given \((x_j)_{j\le n}\in \mathbb {R}^n\). (For the precise formulation see Propositions 1.8 and 1.10, and Corollary 1.11 below.)

Moreover, in Sect. 5.1 we generalize the lower bounds from the boundary into the whole range \((p,q)\in [1,\infty ]\times [1,\infty ]\) (see Fig. 1 below), i.e., we prove

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert > rsim _{p,q} D_1+D_2 + {\left\{ \begin{array}{ll} \mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}| &{} \text {if }\ p\le 2\le q,\\ \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}} &{} \text {if }\ p\le q\le 2,\\ \max _{i\le m}\sqrt{\ln (i+1)} d_i^{\downarrow {}}&{} \text {if } \ 2\le p \le q,\\ 0 &{} \text {if } \ q<p. \end{array}\right. }\nonumber \\ \end{aligned}$$
(1.10)
Fig. 1
figure 1

The third summand in (1.10) and in Conjecture 1: \(\begin{aligned}&\text {northeast lines: \ }{} & {} \qquad \mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}|,\\&\text {horizontal lines: \ }{} & {} \qquad \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}},\\&\text {vertical lines: \ }{} & {} \qquad \max _{i\le m}\sqrt{\ln (i+1)} d_i^{\downarrow {}},\\&\text {northwest lines: \ }{} & {} \qquad 0. \end{aligned}\) Note that the horizontal axis represents 1/p and the vertical one 1/q. Dense lines correspond to exact asymptotics and loosely spaced lines to upper and lower bounds matching up to logarithms

Let us now discuss the relation between the terms appearing above. We postpone the proofs of all the following claims to Sect. 5.

In the case \(p\le 2 \le q\), we have

$$\begin{aligned} D_1+D_2+ \mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}|&\asymp _{p, q} D_1+D_2+\max _{i\le m, j\le n}\sqrt{\ln (j+1)} a_{ij}' \nonumber \\&\asymp _{p, q} D_1+D_2+\max _{i\le m, j\le n}\sqrt{\ln (i+1)} a_{ij}'', \end{aligned}$$
(1.11)

where the matrices \((a_{ij}')_{i,j}\) and \((a_{ij}'')_{i,j}\) are obtained by permuting the columns and rows, respectively, of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _i a_{i1}'\ge \dots \ge \max _i a_{in}'\) and \(\max _j a_{1j}'' \ge \dots \ge \max _j a_{mj}''\). Therefore, in the range \(1\le p\le q \le \infty \) the right-hand side of (1.10) changes continuously with p and q (for a fixed matrix A).

Obviously, \( \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}} \ge \max _{i\le m, j\le n}\sqrt{\ln (j+1)} a_{ij}' \) and, in general, the former quantity may be of larger order than the latter one. In Sect. 5.3 we shall present a more subtle relation: for every \(1\le p\le q< 2\) we shall give an example showing that the right-hand side of (1.10) may be of larger order than \(D_1+D_2+\mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}| \). Note that by duality, i.e., the fact that

$$\begin{aligned} \Vert X_A:\ell _{p}^n\rightarrow \ell _q^m\Vert = \Vert (X_A)^T :\ell _{q^*}^m\rightarrow \ell _{p^*}^n \Vert =\Vert (X^T)_{A^T} :\ell _{q^*}^m\rightarrow \ell _{p^*}^n \Vert , \end{aligned}$$
(1.12)

the same holds in the case \(2< p\le q\). This suggests that the behavior of \(\mathbb {E}\Vert G_A:\ell _p^n\rightarrow \ell _q^m\Vert \) is different in the regions with horizontal or vertical lines than in the region with northeast lines.

Moreover, we have

$$\begin{aligned} D_1+D_2 > rsim _{p,q} {\left\{ \begin{array}{ll} \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}} &{} \text {if } q< p \text { and } q<2, \\ \max _{i\le m}\sqrt{\ln (i+1)} d_i^{\downarrow {}}&{} \text {if } q<p \text { and } p^*<2 \end{array}\right. } \end{aligned}$$
(1.13)

(see Sect. 5.2). Note that this is not the case for \(p \le q\), as one can easily see by considering, e.g., A equal to the identity matrix. This suggests a different (than in other regions), simplified, behavior of \(\mathbb {E}\Vert G_A:\ell _p^n\rightarrow \ell _q^m\Vert \) in the region with northwest lines.

Given the discussion above, the lower bounds presented in (1.10), and the fact that they can be reversed for all \(p\in [1,\infty ]\), \(q\in \{1,\infty \}\) (and for all \(q\in [1,\infty ]\), \(p\in \{1,\infty \}\)), it is natural to conjecture the following.

Conjecture 1

For all \(1\le p, q \le \infty \), we conjecture that

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \asymp _{p,q} D_1+D_2 + {\left\{ \begin{array}{ll} \mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}| &{} \text {if }\ p\le 2\le q,\\ \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}} &{} \text {if }\ p\le q\le 2,\\ \max _{i\le m}\sqrt{\ln (i+1)} d_i^{\downarrow {}}&{} \text {if } \ 2\le p \le q,\\ 0 &{} \text {if } \ q<p. \end{array}\right. }\nonumber \\ \end{aligned}$$
(1.14)

Remark 1.1

One could pose another natural conjecture, based on the potential generalization of the first line of the bound (1.4), namely that the inequality

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \asymp _{p,q} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} +\mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _q \end{aligned}$$
(1.15)

holds for all \(1\le p,q \le \infty \). Indeed, the lower bound is true with constant \(\frac{1}{2}\), since for every deterministic matrix X one has

$$\begin{aligned} \Vert X:\ell _{p}^n\rightarrow \ell _q^m\Vert \ge \max \Bigl \{\max _{i\le m} \Vert (X_{ij})_j\Vert _{p^*},\max _{j\le n} \Vert (X_{ij})_i\Vert _q \Bigr \}. \end{aligned}$$

However, as we prove in Sect. 5.4, this conjecture is wrong: although the right-hand sides of (1.14) and (1.15) are comparable in the range \(1\le p \le 2\le q\le \infty \), for every pair of pq outside this range the right-hand side of (1.15) may be of smaller order than the left-hand side.

Let us now present a conjecture concerning the boundedness of linear operators given by infinite dimensional matrices. In what follows, we say that a matrix \(B= (b_{ij})_{i,j\in \mathbb {N}}\) defines a bounded operator from \(\ell _p(\mathbb {N})\) to \(\ell _q(\mathbb {N})\) if for all \(x \in \ell _p(\mathbb {N})\) the product Bx is well defined, belongs to \(\ell _q(\mathbb {N})\) and the corresponding linear operator is bounded.

Conjecture 2

Let \(A = (a_{ij})_{i,j\in \mathbb {N}}\) be an infinite matrix with real coefficients and let \(1\le p, q \le \infty \). We conjecture that the matrix \(G_A = (a_{ij}g_{ij})_{i,j\in \mathbb {N}}\) defines a bounded linear operator between \(\ell _p(\mathbb {N})\) and \(\ell _q(\mathbb {N})\) almost surely if and only if the matrix \(A\circ A\) defines a bounded linear operator between \(\ell _{p/2}(\mathbb {N})\) and \(\ell _{q/2}(\mathbb {N})\), the matrix \((A\circ A)^T\) defines a bounded linear operator between \(\ell _{q^*/2}(\mathbb {N})\) and \(\ell _{p^*/2}(\mathbb {N})\), and

  • in the case \(p\le 2\le q\), \(\mathbb {E}\sup _{i,j\in \mathbb {N}} |a_{ij}g_{ij}|<\infty \),

  • in the case \(p\le q\le 2\), \(\lim _{j\rightarrow \infty } b_j = 0\), and \(\sup _{j\in \mathbb {N}}\sqrt{\ln (j+1)} b_j^{\downarrow {}} <\infty \), where \(b_j = \Vert (a_{ij})_{i\in \mathbb {N}}\Vert _{2q/(2-q)}\), \(j\in \mathbb {N}\),

  • in the case \(2\le p \le q\), \(\lim _{i\rightarrow \infty } d_i = 0\), and \(\sup _{i\in \mathbb {N}}\sqrt{\ln (i+1)} d_i^{\downarrow {}} <\infty \), where \(d_i {:=}\Vert (a_{ij})_{j\in \mathbb {N}} \Vert _{2p/(p-2)}\), \(i\in \mathbb {N}\),

  • (in the case \(q<p\) we do not need to assume any additional conditions).

We remark that it suffices to prove Conjecture 1 in order to confirm Conjecture 2.

Proposition 1.2

Assume \(1\le p, q\le \infty \). Then (1.14) for this choice of pq implies the assertion of Conjecture 2 for the same choice of pq.

We postpone the proof of this proposition to Subsection 5.5.

In this article, in addition to the cases \(p=q=2\) obtained in [40] and \(p=1, q\ge 2\) proved in [23, 45], we confirm Conjecture 1 when \(p\in \{1,\infty \}\), \(q\in [1,\infty ] \) and when \(q\in \{1,\infty \}\), \(p\in [1,\infty ] \). In all the other cases, we are able to prove the upper bounds only up to logarithmic (in the dimensions mn) multiplicative factors (see Corollary 1.4 below). In particular, Proposition 1.2 implies that Conjecture 2 holds for all \(p\in \{1,\infty \}\), \(q\in [1,\infty ] \) and for all \(q\in \{1,\infty \}\), \(p\in [1,\infty ] \).

Note that in the structured case we work with, interpolating the results obtained for the boundary cases \(p\in \{1, \infty \}\) or \(q\in \{1, \infty \}\) gives a bound with polynomial (in the dimensions) multiplicative constants which are much worse than logarithmic constants from Corollary 1.4 below. However, as we shall see in Remark 4.2 below, interpolation techniques work well in the non-structured case.

1.3 Main results valid for \(1\le p, q\le \infty \)

We start with general theorems valid for the whole range of p, q. Results which are based on methods working only for specific values of p, q, but yielding better logarithmic terms, are presented in the next subsection. A brief summary and comparison of all results can be found in Table 1.

Before stating our main results, we need to introduce additional notation. For a non-empty set \(J\subset \{1,\ldots ,n\}\), and \(1\le p\le \infty \), we define

$$\begin{aligned} B_p^J{:=}\Bigl \{(x_j)_{j\in J}: \sum _{j\in J}|x_j|^p \le 1, \quad x_j\in \mathbb {R}\Bigr \}. \end{aligned}$$

By \(\ell _p^J\) we denote the space \(\mathbb {R}^J{:=}\bigl \{(x_j)_{j\in J}: x_j\in \mathbb {R}\bigr \}\) equipped with the norm

$$\begin{aligned} \Vert x\Vert _{\ell _p^J} = \Bigl (\sum _{j\in J}|x_j|^p \Bigr )^{1/p}, \end{aligned}$$

whose unit ball is \(B_p^J\). Obviously, the space \(\ell _p^J\) can be identified with a subspace of \(\ell _p^n\). If \(A:\ell _p^n\rightarrow \ell _q^m\) is a linear operator, the notation \(A:\ell _p^J\rightarrow \ell _q^I\) means that A is restricted to the space \(\ell _p^J\) and composed with a projection onto \(\ell _q^I\). Moreover, for \(x=(x_1,\ldots , x_n)\in \mathbb {R}^n\), \(\sup _{J} \Vert x\Vert _{\ell _p^J} = \bigl (\sum _{j\le k} |x_j^{\downarrow {}}|^p\bigr )^{1/p}\), where the supremum is taken over all \(J\subset \{1,\ldots ,n\}\) with \(|J|=k\), and \((x_1^{\downarrow {}}, \ldots ,x_n^{\downarrow {}})\) is the non-increasing rearrangement of \((|x_1|,\ldots , |x_n|)\).

Theorem 1.3

(Main theorem in a general version with sets \( I_0\), \(J_0\)) Assume that \(m\le M\), \(n\le N\), \(1\le p,q \le \infty \), and \(G=(g_{ij})_{i\le M, j\le N}\) has i.i.d. standard Gaussian entries. Then

$$\begin{aligned}&\mathbb {E}\sup _{I_0, J_0} \Vert G_A: \ell _p^{J_0} \rightarrow \ell _q^{I_0} \Vert =\mathbb {E}\sup _{I_0, J_0} \sup _{x\in B_p^ {J_0}} \sup _{y\in B_{q^*}^ {I_0}} \sum _{i\in I_0} \sum _{j\in J_0} y_i a_{ij}g_{ij}x_j \\&\quad \le \ln (en)^{1/p^*} \ln (em)^{1/q} \Bigl [ \bigl ( 2.4 \sqrt{\ln (mn)}+ 8 \sqrt{\ln M} + \sqrt{2/\pi }\bigr ) \sup _{I_0,J_0} \Vert A\mathbin {\circ }A :\ell ^ {J_0}_{p/2} \rightarrow \ell ^{ I_0}_{q/2}\Vert ^{1/2}\\&\qquad +\bigl (8\sqrt{\ln N } + 2\sqrt{2/\pi } \bigr ) \sup _{I_0,J_0} \Vert (A\mathbin {\circ }A)^T :\ell ^{ I_0}_{q^*/2} \rightarrow \ell ^ {J_0}_{p^*/2}\Vert ^{1/2} \Bigr ], \end{aligned}$$

where the suprema are taken over all sets \(I_0\subset \{1,\ldots ,M\}\), \(J_0\subset \{1,\ldots ,N\}\) such that \(|I_0|=m\), \(|J_0|=n\).

The above theorem gives an estimate on the largest operator norm among all submatrices of \(G_A\) of fixed size. Let us remark that apart from being of intrinsic interest, quantities of this type (for \(p=q=2\)) have appeared in connection with the study of the restricted isometry property of random matrices with independent rows [2] or in the analysis of entropic uncertainty principles for random quantum measurements [3, 47].

Let us now give an outline of the proof of Theorem 1.3. Note that

$$\begin{aligned} \Vert G_A: \ell _p^{J_0} \rightarrow \ell _q^{I_0} \Vert = \sup _{x\in B_p^ {J_0}} \sup _{y\in B_{q^*}^ {I_0}} \sum _{i\in I_0} \sum _{j\in J_0} y_i a_{ij}g_{ij}x_j. \end{aligned}$$
(1.16)

In the first step of our proof, we find polytopes L and K approximating (with accuracy depending logarithmically on the dimension) the unit balls in \(\ell _p^{J_0}\) and \(\ell _{q^*}^{I_0}\), respectively. The extreme points of the sets K and L have a special and simple structure: absolute values of their non-zero coordinates are all equal to a constant depending only on the size of the support of a given point. Since K is close to \(B_{q^*}^{I_0}\) and L is close to \(B_p^{J_0}\), we may consider only \(x\in {\text {Ext}}(L), y\in {\text {Ext}}(K)\) in (1.16). Since non-zero coordinates of \(x\in {\text {Ext}}(L)\) and \(y\in {\text {Ext}}(K)\), respectively, are all equal up to a sign we may use a symmetrization argument and the contraction principle to remove x and y in the sum on the right-hand side of (1.16). Thus, in the next step of the proof we only need to estimate the expected value of

$$\begin{aligned} \sup _{I_0, J_0} \sup _{\emptyset \ne I\subset I_0}\sup _{\emptyset \ne J\subset J_0} |I|^{-1/q^*}|J|^{-1/p} \sum _{i\in I, j\in J} a_{ij}g_{ij}, \end{aligned}$$

where I and J represent the potential supports of points in \({\text {Ext}}(K)\) and \({\text {Ext}}(L)\). To deal with this quantity, we first consider the suprema over the subsets of fixed sizes and use Slepian’s lemma to compare the supremum of the Gaussian process above with the supremum of another Gaussian process, which may be bounded easily. Then we make use of the term \(|I|^{-1/q^*}|J|^{-1/p}<1\), which allows us to go back to suprema over the sets \(B_p^ {J_0}\) and \(B_{q^*}^ {I_0}\). At the end, we use the Gaussian concentration inequality to unfix the sizes of sets I and J and complete the proof.

Applying Theorem 1.3 with \(N=n\), \(M=m\) immediately yields the following result, which confirms Conjecture  1 up to some logarithmic terms.

Corollary 1.4

(Main theorem – \(\ell _p\) to \(\ell _q\) version) Assume that \(1\le p,q \le \infty \) and \(G=(g_{ij})_{i\le m, j\le n}\) has i.i.d. standard Gaussian entries. Then,

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\lesssim (\ln n)^{1/p^*} (\ln m)^{1/q} \Bigl [ \sqrt{\ln (mn)} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\quad +\sqrt{\ln n} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \Bigr ]. \end{aligned}$$

Moreover, we easily recover the same bound in the case of independent bounded entries. We state and prove a general version with sets \(I_0\) and \(J_0\) akin to Theorem 1.3 in Sect. 3.2.

Corollary 1.5

Assume that \(1\le p,q \le \infty \) and \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries taking values in \([-1,1]\). Then

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\lesssim (\ln n)^{1/p^*} (\ln m)^{1/q} \Bigl [ \sqrt{\ln (mn)} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\quad +\sqrt{\ln n} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \Bigr ]. \end{aligned}$$

We use the two results above to obtain their analogue in the case of \(\psi _r\) entries for \(r\le 2\); these random variables are defined by (1.17).

This class contains, among others,

  • log-concave random variables (which are \(\psi _1\)),

  • heavy tailed Weibull random variables (of shape parameter \(r\in (0,1)\), i.e., \(\mathbb {P}(|X_{ij}|\ge t)=e^{-t^r/L}\) for \(t\ge 0\)),

  • random variables satisfying the condition

    $$\begin{aligned} \Vert X_{ij}\Vert _{2\rho } \le \alpha \Vert X_{ij}\Vert _{\rho } \qquad \text {for all } \rho \ge 1. \end{aligned}$$

    These random variables are \(\psi _r\) with \(r=1/\log _2\alpha \). They were considered recently in [38].

A general version of the following Corollary 1.6 with sets \(I_0\) and \(J_0\) is stated and proved in Subsection 3.2.

Corollary 1.6

Assume that \(K,L >0\), \(r\in (0,2]\), \(1\le p,q \le \infty \), and \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries satisfying

$$\begin{aligned} \mathbb {P}(|X_{ij}|\ge t) \le Ke^{-t^r/L} \qquad \text {for all } t\ge 0, i\le m, j\le n. \end{aligned}$$
(1.17)

Then

$$\begin{aligned}&\mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert \\&\quad \lesssim _{r,K,L} (\ln n)^{1/p^*} (\ln m)^{1/q}\ln (mn)^{\frac{1}{r} - \frac{1}{2}} \Bigl [ \sqrt{\ln (mn)} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\qquad +\sqrt{\ln n} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \Bigr ]. \end{aligned}$$
Table 1 In several cases we prove a result of the type: \(\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert \lesssim _{p,q} \varvec{L_0}\Bigl ( \varvec{L_1} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} + \varvec{L_2} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \Bigr ), \end{aligned}\) where \(L_0, L_1, L_2\) are some logarithmic terms depending only on nm and \(p,q \in [1,\infty ]\). The table presents a summary of the results. The expression ‘set K’ refers to the use of Lemma 2.3. We denote \(\begin{aligned} L_{K}{:=}\ln (n)^{1/p^*}\ln (m)^{1/q}. \end{aligned}\)

1.4 Results for particular ranges of p, q

We continue with results for some specific ranges of p, q, where we are able to prove estimates with better logarithmic dependence (results which follow from them by duality (1.12) are stated in Table 1 to keep the presentation short). We postpone their proofs to Sect. 4. We start with the case of Gaussian random variables. Recall that \(\gamma _q = (\mathbb {E}|g|^q)^{1/q}\), where g is a standard Gaussian random variable.

Proposition 1.7

For all \(1\le p\le 2\) and \(1\le q< \infty \), we have

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\le \gamma _q \ln (en)^{1/p^*} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\nonumber \\&\quad + 2.2\ln (en)^{1/2+1/p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$
(1.18)

If \(q=1\) or \(p=\infty \), then we are able to get a result without logarithmic terms. Recall that for a sequence \((x_j)_{j\le n}\) we denote by \((x_j^{\downarrow {}})_{j\le n}\) the non-increasing rearrangement of \((|x_j|)_{j\le n}\).

Proposition 1.8

  1. (i)

    For \(1< p \le \infty \), we have

    $$\begin{aligned}{} & {} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{1/2}\Vert ^{1/2} +\Vert (A\mathbin {\circ }A)^T :\ell ^m_{\infty } \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \lesssim \mathbb {E}\Vert G_A :\ell ^n_p\rightarrow \ell ^m_1\Vert \\{} & {} \quad \le \gamma _1\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{1/2}\Vert ^{1/2} + 2 \gamma _{p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{\infty } \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$
  2. (ii)

    Moreover,

    $$\begin{aligned} \mathbb {E}\Vert G_A :\ell ^n_1\rightarrow \ell ^m_1\Vert \asymp \Vert A\mathbin {\circ }A :\ell ^n_{1/2} \rightarrow \ell ^m_{1/2}\Vert ^{1/2} + \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}}, \end{aligned}$$

    where \(b_j {:=}\Vert (a_{ij})_{i\le m}\Vert _2\), \(j\le n\).

Note that (ii) shows in particular that a blow up of the constant \(\gamma _{p^*}\) in the upper estimate (i) for \(p\rightarrow 1\) is necessary, since the right most summands in (i) and (ii) are non-comparable.

Remark 1.9

It shall be clear from the proof that the upper bound in part (i) of Proposition 1.8 remains valid for any random matrix X (instead of G) with independent isotropic rows (i.e., rows with mean zero and the covariance matrix equal to the identity) such that

$$\begin{aligned} \Bigr ( \mathbb {E}\Bigl |\sum _{i=1}^m \alpha _{i}X_{ij}\Bigr |^{p^*}\Bigr )^{1/p^*} \lesssim _p \Bigl (\sum _{i=1}^m \alpha _{i}^2\Bigr )^{1/2} \qquad \text {for all } \alpha \in \mathbb {R}^m, j\le n. \end{aligned}$$
(1.19)

Note that the independence and the isotropicity of rows imply that also the columns of X are isotropic (since the coordinates of every column are independent and have mean zero and variance 1). Therefore, whenever \(p\ge 2\), condition (1.19) is always satisfied (because the \(p^*\)-integral norm is bounded above by the 2-integral norm, which is then equal to the right-hand side of (1.19), since the covariance matrix of each column is equal to the \(m\times m\) identity matrix).

The following proposition generalizes part (ii) of Proposition  1.8 to an arbitrary \(q \le 2\). We list it separately since we present a proof using different arguments. Recall that the case \(p=1\), \(q\ge 2\) was established before, see (1.6).

Proposition 1.10

If \(1\le q \le 2\), then

$$\begin{aligned} \Vert G_A:\ell _1^n \rightarrow \ell _q^m\Vert&\asymp \Vert A\mathbin {\circ }A:\ell _{1/2}^n \rightarrow \ell _{q/2}^m\Vert ^{1/2} + \max _{j\le n} (\sqrt{\ln (j+1)} b_j^{\downarrow {}})\\&= \max _{j\le n} \Vert (a_{ij})_{i\le m}\Vert _q + \max _{j\le n} (\sqrt{\ln (j+1)} b_j^{\downarrow {}}), \end{aligned}$$

where \(b_j = \Vert (a_{ij})_{i\le m}\Vert _{2q/(2-q)}\) for \(j\le n\).

Proposition 1.10 immediately implies its dual version.

Corollary 1.11

If \(2\le p \le \infty \), then

$$\begin{aligned} \Vert G_A:\ell _p^n \rightarrow \ell _\infty ^m\Vert&\asymp \Vert (A\mathbin {\circ }A)^T:\ell _{1/2}^m \rightarrow \ell _{p^*/2}^n\Vert ^{1/2} + \max _{i\le m} (\sqrt{\ln (i+1)} d_i^{\downarrow {}})\\&= \max _{i\le m} \Vert (a_{ij})_{j\le n}\Vert _{p^*} + \max _{i\le m} (\sqrt{\ln (i+1)} d_i^{\downarrow {}}), \end{aligned}$$

where \(d_i= \Vert (a_{ij})_{j\le n}\Vert _{2p^*/(2-p^*)} = \Vert (a_{ij})_{j\le n}\Vert _{2p/(p-2)} \) for \(i\le m\).

Remark 1.12

Corollary 1.11 and the dual version of (1.6) provide the exact behavior of expected norm of Gaussian operator from \(\ell _p^n\) to \(\ell _q^m\) not only when \(q=\infty \), but also for \(q\ge c_0 \ln m\), as we explain now. For all \(q\ge q_0{:=}c_0 \ln m\) we have the following inequalities for norms on \(\mathbb {R}^m\),

$$\begin{aligned} \Vert \cdot \Vert _q\ge \Vert \cdot \Vert _\infty \ge m^{-1/q_0} \Vert \cdot \Vert _{q_0} = e^{-1/c_0} \Vert \cdot \Vert _{q_0} \ge e^{-1/c_0} \Vert \cdot \Vert _q, \end{aligned}$$

therefore,

$$\begin{aligned} \frac{1}{e^{1/c_0}}\mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _{q }^{m}\Vert \le \mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _\infty ^{m}\Vert \le \mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _{q }^{m}\Vert . \end{aligned}$$

Similarly,

$$\begin{aligned} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert&\asymp _{c_0} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{1/2} \rightarrow \ell ^n_{p^*/2}\Vert . \end{aligned}$$

Proposition 1.7 implies the following estimate for matrices with independent \(\psi _r\) entries, in the same way as Corollary 1.4 implies Corollary  1.6 (see Sect. 3.2).

Corollary 1.13

Assume that \(K,L >0\), \(r\in (0,2]\), and \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries satisfying

$$\begin{aligned} \mathbb {P}(|X_{ij}|\ge t) \le Ke^{-t^r/L} \qquad \text {for all } t\ge 0. \end{aligned}$$
(1.20)

Then, for \(1\le p\le 2\), \(1\le q \le \infty \),

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\lesssim _{r,K,L} (\ln n)^{1/p^*}\ln (nm)^{1/r - 1/2} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\nonumber \\&\quad + (\ln n)^{1/2+1/p^*}\ln (nm)^{1/r - 1/2} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

By Hoeffding’s inequality (i.e., Lemma 2.13) we know that matrices with independent valued in \([-1,1]\) entries having mean zero satisfy (1.20) with \(r=2\) and \(K=2=L\). In this special case of independent bounded random variables one can also adapt the methods of [9] to prove in the smaller range \(1\le p\le 2 \le q < \infty \) the following result with explicit numerical constants and improved dependence on n (note that the second logarithmic term is better than in Corollary 1.13, where the exponent equals \(1/2+1/p^*\)).

Proposition 1.14

Assume that \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries taking values in \([-1,1]\). Then, for \(1\le p\le 2\le q< \infty \),

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\le C(q) \ln (en)^{1/p^*} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} \\&\quad + 10^{1/q} \ln (en)^{1/q+1/p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} , \end{aligned}$$

where \(C(q) {:=}2 (q\Gamma (q/2))^{1/q} \asymp \sqrt{q}\).

Finally, we have the following general result for matrices with independent \(\psi _r\) entries (cf. Corollary 1.6).

Theorem 1.15

Let \(K,L>0\), \(r\in (0,2]\), and assume that \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries satisfying

$$\begin{aligned} \mathbb {P}(|X_{ij}|\ge t) \le Ke^{-t^r/L} \quad \text {for all } t\ge 0. \end{aligned}$$

Then, for all \(1\le p\le 2\) and \(1\le q< \infty \),

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\lesssim _{r,K,L} \ q^{1/r} (\ln n)^{1/p^*} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\quad \ \ \ + (\ln n)^{1/2+1/p^*} \ln (mn)^{1/r} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

Having in mind the strategy of proof described after Theorem  1.3, let us elaborate on the idea of proof of Theorem 1.15. We shall split the matrix X into two parts \(X^{(1)}\) and \(X^{(2)}\) which we treat separately. In our decomposition, all entries of \(X^{(1)}\) are bounded by \(C\ln (mn)^{1/r}\) and the probability that \(X^{(2)} \ne 0\) is very small. Then we shall deal with \(X^{(2)}\) using a crude bound (Lemma 4.3) and the fact that the probability that \(X^{(2)} \ne 0\) is small enough to compensate it. In order to bound the expectation of the norm of \(X^{(1)}\), we require a cut-off version of Theorem 1.15 (Lemma 4.4). To obtain it, we shall replace \(B_p^n\) in the expression for the operator norm with a suitable polytope K (and leave \(\sup _{y\in B_{q^*}^m}\) as it is) and then apply a Gaussian-type concentration inequality to the function \(Z\mapsto F(Z) {:=}\Vert Z_A x\Vert _q\) for \(x\in {\text {Ext}}(K)\).

1.5 Tail bounds

All the bounds for \(\mathbb {E}\Vert X_A:\ell _p^n \rightarrow \ell _q^m\Vert \) provided in this work for random matrices X also yield a tail bound for \(\Vert X_A:\ell _p^n \rightarrow \ell _q^m\Vert \). (It is clear from the proof of Proposition 1.16—see Sect. 3.2—that the same applies to the estimates for \(\sup _{I_0, J_0} \Vert G_A: \ell _p^{J_0} \rightarrow \ell _q^{I_0} \Vert \), but we omit the details to keep the presentation clear.)

Proposition 1.16

(Tail bound) Assume that \(K,L \ge 1\), \(r\in (0,2]\), \(1\le p,q \le \infty \), and \(\gamma \ge 1\). Fix a deterministic \(m\times n\) matrix A and assume that

$$\begin{aligned} D \ge \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}. \end{aligned}$$

If for all random matrices \(X=(X_{ij})_{i\le m, j\le n}\) with independent mean-zero entries satisfying

$$\begin{aligned} \mathbb {P}(|X_{ij}|\ge t) \le Ke^{-t^r/L} \qquad \text {for all } t\ge 0,\, i\le m,\, j\le n, \end{aligned}$$
(1.21)

we have

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert \le \gamma D, \end{aligned}$$
(1.22)

then, for all random matrices with independent mean-zero entries satisfying (1.21), we also have

$$\begin{aligned} \bigl ( \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert ^\rho \bigr )^{1/\rho } \lesssim _{r,K,L} \ \rho ^{1/r}\gamma D \qquad \text {for all } \rho \ge 1, \end{aligned}$$
(1.23)

and, for all \(t>0\),

$$\begin{aligned} \mathbb {P}\bigl ( \Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert \ge t\gamma D \bigr ) \le C(r,K,L) \exp \bigl ( -t^r/C(r,K,L)\bigr ). \end{aligned}$$
(1.24)

Note that random variables taking values in \([-1,1]\) satisfy condition (1.21) with \(r=2\), \(K=e\), and \(L=1\). Thus, Proposition 1.16 applies also in the setting of bounded or Gaussian entries.

1.6 Organization of the paper

In Sect. 2 we gather various preliminary results we shall use in the sequel. Section 3 contains the proofs of the main results valid for all p, q (i.e., Theorem 1.3 and its corollaries) and the tail bound from Proposition 1.16. In Sect. 4 we prove the results for specific choices/ranges of p, q. In Sect. 5 we prove lower bounds on expected operator norms, showing in particular that our estimates are optimal up to logarithmic factors. We also prove other results justifying the proposed form of Conjecture 1. The last subsection of Sect. 5 is devoted to infinite dimensional Gaussian operators.

2 Preliminaries

2.1 General facts

We start with some easy lemmas which will be used repeatedly throughout the paper.

Lemma 2.1

For any real \(m\times n\) matrix \(B = (b_{ij})_{i\le m, j\le n}\) and \(0<r\le 1\le s\le \infty \), we have

$$\begin{aligned} \Vert B :\ell ^n_{r} \rightarrow \ell ^m_{s} \Vert =\Vert B:\ell ^n_{1} \rightarrow \ell ^m_{s} \Vert = \max _{j\le n} \Vert (b_{ij})_{i=1}^m\Vert _s. \end{aligned}$$

Furthermore, for a real \(m\times n\) matrix \(A = (a_{ij})_{i\le m, j\le n}\) and \(1\le p\le 2\), \(p \le q\le \infty \),

$$\begin{aligned} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2} = \max _{j\le n} \Vert (a_{ij})_{i=1}^m\Vert _q. \end{aligned}$$

Proof

Since \(0<r\le 1\), we have \({\text {conv}}B_r^n = B_1^n\), where \({\text {conv}}S\) denotes the convex hull of the set S. Moreover, the extreme points of \(B_1^n\) are the signed standard unit vectors, i.e., \(\pm e_1,\dots ,\pm e_n\), and \(z\mapsto \Vert z\Vert _s\) is a convex function (since \(s\ge 1\)). Thus,

$$\begin{aligned} \sup _{x\in B_r^n} \Vert Bx\Vert _s = \sup _{x\in {\text {conv}}B_r^n} \Vert Bx\Vert _s = \sup _{x\in B_1^n}\Vert Bx\Vert _s = \max _{1\le j \le n} \Vert Be_j\Vert _s = \max _{1\le j \le n} \Vert (b_{ij})_{i=1}^m\Vert _s. \end{aligned}$$

This immediately implies the result for the Hadamard product \(A\circ A=:B\) if \(1\le p\le 2\le q\le \infty \).

If, on the other hand, \(1\le p\le q\le 2\), then by the subadditivity of the function \(t\mapsto |t|^{q/2}\),

$$\begin{aligned} \Vert A\mathbin {\circ }A:\ell _{p/2}^n \rightarrow \ell _{q/2}^m\Vert ^{q/2}&= \sup _{x \in B_{p/2}^n} \sum _{i=1}^m \Bigl |\sum _{j=1}^n a_{ij}^2 x_j\Bigr |^{q/2} \le \sup _{x \in B_{p/2}^n} \sum _{i=1}^m \sum _{j=1}^n |a_{ij}|^{q} |x_j|^{q/2} \\&= \Vert (|a_{ij}|^q)_{i\le m,j\le n}:\ell _{p/q}^n\rightarrow \ell _{1}^m\Vert = \max _{j\le n} \Vert (a_{ij})_{i\le m}\Vert _q^q, \end{aligned}$$

where in the last equality we used the first part of the Lemma. Since we clearly have

$$\begin{aligned} \Vert A\mathbin {\circ }A:\ell _{p/2}^n \rightarrow \ell _{q/2}^m\Vert \ge \max _{j\le n}\Vert (a_{ij}^2)_{i\le m}\Vert _{q/2} = \max _{j\le n}\Vert (a_{ij})_{i\le m}\Vert _{q}^2, \end{aligned}$$

we thus obtain

$$\begin{aligned} \Vert A\mathbin {\circ }A:\ell _{p/2}^n \rightarrow \ell _{q/2}^m\Vert ^{1/2} = \max _{j\le n}\Vert (a_{ij})_{i\le m}\Vert _{q}. \end{aligned}$$

\(\square \)

Definition 2.2

A set \(K\subset \mathbb {R}^n\) is called unconditional, if for every \((x_j)_{j\le n}\in K\) and every \((\varepsilon _j)_{j\le n} \in \{-1,1\}^n\) we have \((\varepsilon _ j x_j)_{j\le n} \in K\).

We shall use the following version of [49, Lemma 2.1].

Lemma 2.3

Assume that \(1\le p\le \infty \), \(n\in \mathbb {N}\), and define the convex set

$$\begin{aligned} K{:=}{\text {conv}}\Bigl \{ \frac{1}{|J|^{1/p}}\bigl ( \varepsilon _j {\textbf{1}}_{\{j\in J\}} \bigr )_{j=1}^n: J\subset \{1,\dots ,n\}, J\ne \emptyset , (\varepsilon _j )_{j=1}^n\in \{-1,1\}^n\Bigr \}. \end{aligned}$$

Then \(B_p^n \subset \ln (en)^{1/p^*} K\).

Proof

Fix a vector \(x=(x_1,\dots ,x_n)\in \mathbb {R}^n\). We want to prove that \(\Vert x\Vert _K \le \ln (en)^{1/p^*} \Vert x\Vert _p\), where

$$\begin{aligned} \Vert x\Vert _K = \inf \{\lambda >0 :x\in \lambda K\} \end{aligned}$$

denotes the norm generated by K, i.e., its Minkowski gauge. Since both K and \(B_p^n\) are permutationally invariant and unconditional (see Definition 2.2), we may and will assume that \(x_1\ge \dots \ge x_n\ge 0\). If we put \(x_{n+1}{:=}0\), then

$$\begin{aligned} x = \sum _{j=1}^n x_j e_j = \sum _{j=1}^n (x_j-x_{j+1}) (e_1+\dots + e_j). \end{aligned}$$

Since \(\Vert e_1+\dots +e_j\Vert _K = j^{1/p}\) for \(1\le j \le n\),Footnote 3 the triangle and Hölder inequalities yield

$$\begin{aligned} \Vert x\Vert _K&\le \sum _{j=1}^n (x_j-x_{j+1}) j^{1/p} = \sum _{j=1}^n x_j (j^{1/p}-(j-1)^{1/p}) \\&\le \sum _{j=1}^n x_j j^{1/p - 1} \le \Vert x\Vert _p \Bigl (\sum _{j=1}^n \frac{1}{j}\Bigr )^{1/p^*} \le \Vert x\Vert _p \ln (en)^{1/p^*}, \end{aligned}$$

where we also used the elementary estimates \(j^{1/p}-(j-1)^{1/p}\le j^{\frac{1}{p}-1}\) and \(\sum _{j=1}^n \frac{1}{j}\le 1+\int _1^n \frac{1}{t} dt = \ln (en)\). This completes the proof. \(\square \)

Remark 2.4

The term \(\ln (en)^{1/p^*}\) can be replaced by \(1+\frac{1}{p} \ln (en)^{1/p^*}\) by writing in the above proof

$$\begin{aligned} \sum _{j=1}^n x_j (j^{1/p}-(j-1)^{1/p})&\le x_1+\frac{1}{p}\sum _{j=2}^n x_j (j-1)^{\frac{1}{p} - 1} \le \Vert x\Vert _p \Bigl (1+\frac{1}{p} \Bigl (\sum _{j=1}^{n-1} \frac{1}{j}\Bigr )^{1/p^*} \Bigr ). \end{aligned}$$

Here we used the estimates \(j^{1/p}-(j-1)^{1/p}\le \frac{1}{p} (j-1)^{\frac{1}{p}-1}\) for \(j>1\) (which follows from the concavity of the function \(t\mapsto t^{1/p}\)) and the trivial one \(x_1\le \Vert x\Vert _p\).

Remark 2.5

The constant \((\ln n)^{1/p^*}\) in Lemma 2.3 is sharp up to a constant depending on p for every \(1\le p<\infty \) (when \(p=\infty \), \(K=B_p^n\) and the constant depending on p degenerates as \(p \rightarrow \infty \)). More precisely, we shall prove that if \(B_p^n \subset C(p,n) K\), then \(C(p,n) > rsim _p (\ln n)^{1/p^*}\). Note that \(B_p^n \subset C(p,n) K\) if and only if

$$\begin{aligned} \Vert \cdot \Vert _{p^*}\le C(p,n) \Vert \cdot \Vert _{K}^*, \end{aligned}$$
(2.1)

where \(\Vert \cdot \Vert _K^*\) is norm dual to \(\Vert \cdot \Vert _K\).

Let \({\text {Ext}}K \) be the set of extreme points of K, and let \((y_j^{\downarrow {}})_{j\le n}\) be the non-increasing rearrangement of \((|y_j|)_{j\le n}\). For every \(y\in \mathbb {R}^n\),

$$\begin{aligned} \Vert y\Vert _K^*&= \sup _{x\in K} \sum _{j=1}^n x_jy_j = \sup _{x\in {\text {Ext}}K} \sum _{j=1}^n x_jy_j =\sup _{J\subset [n], J\ne \emptyset } \sum _{j\in J} |y_j| \frac{1}{|J|^{1/p}} \\&=\sup _{k \le n} \sum _{j=1}^k y_j^{\downarrow {}} \frac{1}{k^{1/p}}. \end{aligned}$$

Assume that \(p^*\ne 1\) and put \(y_j {:=}j^{-1/p^*}\). We get

$$\begin{aligned} \Vert y\Vert _K^*=\sup _{k\le n} \sum _{j=1}^k j^{-1/p^*} \frac{1}{k^{1/p}} \asymp _p \sup _{k\le n} k^{1-\frac{1}{p^*}} \frac{1}{k^{1/p}} =1, \end{aligned}$$

whereas

$$\begin{aligned} \Vert y\Vert _{p^*} = \Bigl ( \sum _{j=1}^n j^{-1} \Bigr )^{1/p^*} \asymp (\ln n)^{1/p^*}, \end{aligned}$$

so inequality (2.1) yields that \(C(p,n) > rsim _p (\ln n)^{1/p^*}\).

We shall also need the following standard lemma (see, e.g., [41, Sect. 1.3]). We will use the versions with \(r=1\) and \(r=2\).

Lemma 2.6

Let Z be a nonnegative random variable. If there exist \(a \ge 0\), \(b, \alpha , \beta , s_0> 0\), and \(r\ge 1\) such that

$$\begin{aligned} \mathbb {P}(Z\ge a + b s) \le \alpha e^{-\beta s^r} \quad \text {for } s\ge s_0, \end{aligned}$$

then

$$\begin{aligned} \mathbb {E}Z \le a + b \Bigl (s_0 + \alpha \frac{e^{-\beta s_0^r}}{r\beta s_0^{r-1}}\Bigr ). \end{aligned}$$

Proof

Integration by parts yields

$$\begin{aligned} \mathbb {E}Z&\le a + bs_0 + \int _{a+b s_0}^\infty \mathbb {P}(Z\ge u) du = a + bs_0 + b\int _{s_0}^\infty \mathbb {P}(Z\ge a+b s) ds \\&\le a + bs_0 + b\alpha \int _{s_0}^\infty e^{-\beta s^r} ds\\&\le a + bs_0 + \frac{b\alpha }{r\beta s_0^{r-1}}\int _{s_0}^\infty r\beta s^{r-1} e^{-\beta s^r} ds = a + b \Bigl (s_0 + \alpha \frac{e^{-\beta s_0^r}}{r\beta s_0^{r-1}}\Bigr ). \end{aligned}$$

\(\square \)

2.2 Contraction principles

Below we recall the well-known contraction principle due to Kahane and its extension by Talagrand (see, e.g., [64, Exercise 6.7.7] and [43, Theorem 4.4 and the proof of Theorem 4.12]).

Lemma 2.7

(Contraction principle) Let \((X,\Vert \cdot \Vert )\) be a normed space, \(n\in \mathbb {N}\), and \(\rho \ge 1\). Assume that \(x_1,\dots ,x_n\in X\) and \(\alpha {:=}(\alpha _1,\dots ,\alpha _n)\in \mathbb {R}^n\). Then, if \(\varepsilon _1,\dots ,\varepsilon _n\) are independent Rademacher random variables, we have

$$\begin{aligned} \mathbb {E}\bigl \Vert \sum _{i=1}^n \alpha _i\varepsilon _ix_i \bigr \Vert ^{\rho } \le \Vert \alpha \Vert _\infty ^{ \rho }\, \mathbb {E}\bigl \Vert \sum _{i=1}^n \varepsilon _ix_i \bigr \Vert ^{ \rho }. \end{aligned}$$

Lemma 2.8

(Contraction principle) Let T be a bounded subset of \(\mathbb {R}^n\). Assume that \(\varphi _i:\mathbb {R}\rightarrow \mathbb {R}\) are 1-Lipschitz and \(\varphi _i(0)=0\) for \(i=1,\ldots ,n\). Then, if \(\varepsilon _1,\dots ,\varepsilon _n\) are independent Rademacher random variables, we have

$$\begin{aligned} \mathbb {E}\sup _{t\in T}\sum _{i=1}^n \varepsilon _i\varphi _i(t_i) \le \mathbb {E}\sup _{t\in T}\sum _{i=1}^n \varepsilon _it_i. \end{aligned}$$

2.3 Gaussian random variables

The following result is fundamental to the theory of Gaussian processes and referred to as Slepian’s inequality or Slepian’s lemma [52]. We use the following (slightly adapted) version taken from [11, Theorem 13.3].

Lemma 2.9

(Slepian’s lemma) Let \((X_t)_{t\in T}\) and \((Y_t)_{t\in T}\) be two Gaussian random vectors satisfying \(\mathbb {E}[X_t]=\mathbb {E}[Y_t]\) for all \(t\in T\). Assume that, for all \(s,t \in T\), we have \(\mathbb {E}[(X_s-X_t)^2] \le \mathbb {E}[(Y_s-Y_t)^2]\). Then

$$\begin{aligned} \mathbb {E}\sup _{t\in T} X_t \le \mathbb {E}\sup _{t\in T} Y_t. \end{aligned}$$

The next lemma is folklore. We include a short proof of an estimate with specific constants for the sake of completeness.

Lemma 2.10

Assume that \(k\ge 2\) and let \(g_i\), \(i\le k\), be standard Gaussian random variables (not necessarily independent). Then

$$\begin{aligned} \mathbb {E}\max _{1\le i\le k} g_i&\le \sqrt{2 \ln k},\\ \mathbb {E}\max _{1\le i\le k} |g_i|&\le 2\sqrt{ \ln k}. \end{aligned}$$

Proof

Since the moment generating function of a Gaussian random variable is given by \(\mathbb {E}e^{tg_1} = e^{t^2/2}\), it follows from Jensen’s inequality that

$$\begin{aligned} \mathbb {E}\max _{i\le k} g_i&\le \frac{1}{t} \ln \bigl ( \mathbb {E}\exp (t\max _{i\le k } g_i) \bigr )\\&\le \frac{1}{t} \ln \bigl (\mathbb {E}\sum _{i=1}^k \exp (t g_i)\bigr ) = \frac{1}{t} \ln \bigl (k e^{t^2/2}\bigr ) = \frac{\ln k}{t} +\frac{t}{2}. \end{aligned}$$

By taking \(t=\sqrt{2\ln k}\), we get the first assertion. We apply this inequality with random variables \(g_1, -g_1, \ldots , g_k, -g_k\) to get the second assertion, namely

$$\begin{aligned} \mathbb {E}\max _{i\le k}|g_i| = \mathbb {E}\max _{i\le k} \max \{g_i, -g_i\}\le \sqrt{2\ln (2k)} \le \sqrt{2\ln (k^2)}=2\sqrt{\ln k}. \end{aligned}$$

\(\square \)

The next two lemmas are taken from [61]. Recall that \(b_1^{\downarrow {}} \ge \ldots \ge b_n^{\downarrow {}}\) is the non-increasing rearrangement of \((|b_j|)_{j\le n}\).

Lemma 2.11

([61, Lemma 2.3]) Assume that \((b_j)_{j\le n}\in \mathbb {R}^n\) and let \((X_j)_{j\le n}\) be random variables (not necessarily independent) satisfying

$$\begin{aligned} \mathbb {P}(|X_j|>t)\le Ke^{-t^2/b_j^2} \qquad \text {for all } t\ge 0,\ j\le n. \end{aligned}$$

Then

$$\begin{aligned} \mathbb {E}\max _{j\le n} |X_j| \lesssim _{K} \max _{j\le n} b_j^{\downarrow {}} \sqrt{\ln (j+1)}. \end{aligned}$$

Lemma 2.12

([61, Lemma 2.4]) Assume that \((b_j)_{j\le n}\in \mathbb {R}^n\) and let \((X_j)_{j\le n}\) be independent random variables with \(X_j \sim {\mathcal {N}}(0,b_j^{2})\) for \(j\le n\). Then

$$\begin{aligned} \mathbb {E}\max _{j\le n} |X_j| > rsim \max _{j\le n} b_j^{\downarrow {}} \sqrt{\ln (j+1)}. \end{aligned}$$

Lemma 2.13

(Hoeffding’s inequality, [32, Theorem 2]) Assume that \((b_j)_{j\le n}\in \mathbb {R}^n\) and let \(X_j\), \(j\le n\), be independent mean-zero random variables such that \(|X_j|\le 1\) a.s. Then, for all \(t\ge 0\),

$$\begin{aligned} \mathbb {P}\bigl (\bigl |\sum _{j=1}^n b_j X_j \bigr |\ge t \bigr ) \le 2 \exp \Bigl ( - \frac{t^2}{2 \sum _{j=1}^n b_j^2 } \Bigr ). \end{aligned}$$

2.4 Random variables with heavy tails

The following lemma is a special case of [34, Theorem 1].

Lemma 2.14

(Contraction principle) Let \(K, L>0\) and assume that \((\eta _i)_{i\le n}\) and \((\xi _i)_{i\le n}\) are two sequences of independent symmetric random variables satisfying for every \(i\le n\) and \(t\ge 0\),

$$\begin{aligned} \mathbb {P}(|\eta _i| \ge t) \le K \mathbb {P}(L|\xi _i|\ge t). \end{aligned}$$

Then, for every convex function \(\varphi \) and every \(a_1,\ldots , a_n \in \mathbb {R}\),

$$\begin{aligned} \mathbb {E}\varphi \Big (\sum _{i=1}^n a_i \eta _i\Big )\le \mathbb {E}\varphi \Big (KL\sum _{i=1}^n a_i\xi _i\Big ). \end{aligned}$$

Lemma 2.15

([31, Theorem 6.2]) Assume that \(Z_1, \dots , Z_n\) are independent symmetric Weibull random variables with shape parameter \(r\in (0,1]\) and scale parameter 1, i.e., \(\mathbb {P}(|Z_i|\ge t)=e^{-t^r}\) for \(t\ge 0\). Then, for every \(\rho {}\ge 2\) and \(a\in \mathbb {R}^n\),

$$\begin{aligned} \Bigl \Vert \sum _{i=1}^n a_iZ_i \Bigr \Vert _{\rho {}} \asymp \max \bigl \{ \sqrt{\rho {}} \Vert a\Vert _2\Vert Z_1\Vert _2, \Vert a\Vert _{\rho {}}\Vert Z_1\Vert _{\rho {}} \bigr \}. \end{aligned}$$

Remark 2.16

(Moments of Weibull random variables) Note that if Z is a symmetric random variable such that \(\mathbb {P}(|Z|\ge t)=e^{-t^r}\), \(r\in (0,2]\), then \(Y=|Z|^{r}{\text {sgn}}(Z)\) has (symmetric) exponential distribution with parameter 1, so by Stirling’s formula we obtain, for all \(\rho \ge 1\),

$$\begin{aligned} \Vert Z\Vert _{\rho {}} = \Vert Y\Vert _{\rho {}/r}^{1/r} = \Gamma \Bigl (\frac{\rho {}}{r} +1\Bigr )^{1/{\rho {}}} \le \Bigl (\frac{C}{r}\Bigr )^{\frac{1}{r} + \frac{1}{2\rho {}}} \rho {}^{1/r} \le \Bigl (\frac{C}{r}\Bigr )^{\frac{1}{r} + \frac{1}{2}} \rho {}^{1/r}, \end{aligned}$$

with \(C\ge 1\).

The three previous results easily imply the following estimate for integral norms of linear combinations of independent \(\psi _r\) random variables.

Proposition 2.17

Let \(K, L>0\), \(r\in (0,1]\) and assume that \(Z_1, \dots , Z_n\) are independent symmetric random variables satisfying \(\mathbb {P}(|Z_i|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\) and \(i\le n\). Then, for every \(\rho {}\ge 2\) and \(a\in \mathbb {R}^n\),

$$\begin{aligned} \Bigl \Vert \sum _{i=1}^n a_iZ_i \Bigr \Vert _{\rho {}}&\lesssim (C/r)^{\frac{1}{r} + \frac{1}{2}} KL^{1/r}\max \bigl \{ \sqrt{\rho {}} \Vert a\Vert _2 , \rho {}^{1/r} \Vert a\Vert _{\rho {}} \bigr \} \\&\lesssim (C'/r)^{\frac{1}{r} + \frac{1}{2}} KL^{1/r}\max \bigl \{ \sqrt{\rho {}} \Vert a\Vert _2 , \rho {}^{1/r} \Vert a\Vert _\infty \bigr \}. \end{aligned}$$

Proof

The first inequality is an immediate consequence of Lemma 2.14 (applied with \(\eta _i=Z_i\), independent Weibull variables \(\xi _i\) with shape parameter r and scale parameter 1, and with the convex function \(\varphi :t\mapsto |t|^{\rho }\)), Lemma 2.15, and Remark 2.16. The second inequality follows from

$$\begin{aligned} \Vert a\Vert _{\rho {}}\le \Vert a\Vert _2^{2/\rho {}}\Vert a\Vert _\infty ^{1-2/{\rho {}}}&= \rho {}^{\frac{2}{\rho {}r}}\Vert \rho {}^{-1/r}a\Vert _2^{2/\rho {}}\Vert a\Vert _\infty ^{1-2/{\rho {}}} \\&\le {\rho {}}^{\frac{2}{\rho {}r}}\Bigl (\frac{2}{\rho {}^{1+1/r}} \Vert a\Vert _2 + \Bigl (1-2/{\rho {}}\Bigr ) \Vert a\Vert _\infty \Bigr ), \end{aligned}$$

where in the last step we used the inequality between weighted arithmetic and geometric means. \(\square \)

The next lemma is standard and provides us with several equivalent formulations of the \(\psi _r\) property expressed through tail bounds, growth of moments, and the exponential moments, respectively. We provide a brief proof, since in the literature one usually finds versions for \(r\ge 1\) only.

Lemma 2.18

Assume that \(r\in (0,2]\). Let Z be a non-negative random variable. The following conditions are equivalent:

  1. (i)

    There exist \(K_1,L_1>0\) such that

    $$\begin{aligned} \mathbb {P}(Z\ge t) \le K_1 e^{- t^r/ L_1} \quad \text {for all } t\ge 0. \end{aligned}$$
  2. (ii)

    There exists \(K_2\) such that

    $$\begin{aligned} \Vert Z\Vert _{\rho {}} \le K_2 \rho {}^{1/r} \quad \text {for all } \rho {} \ge 1. \end{aligned}$$
  3. (iii)

    There exist \(K_3, u>0\) such that

    $$\begin{aligned} \mathbb {E}\exp (u Z^r) \le K_3. \end{aligned}$$

Here, (i) implies (ii) with \(K_2 = C(r)K_1L_1^{1/r}\), (ii) implies (iii) with \(K_3 = 1+e^{(2e r)^{-1}}\), \(u=(2erK_2^r)^{-1}\), and (iii) implies (i) with \(K_1 = K_3\), \(L_1=u^{-1}\).

Proof

Property (i) implies (ii) by Lemma 2.14 (applied with \(n=1\), \(\eta _1=Z\) and an independent Weibull variable \(\xi _1\) with parameter r) and Remark 2.16. Property (iii) implies (i) by Chebyshev’s inequality:

$$\begin{aligned} \mathbb {P}(Z\ge t)=\mathbb {P}\bigl (\exp (uZ^r)\ge \exp (ut^r)\bigr ) \le K_3 \exp (-ut^r). \end{aligned}$$

Assume now that (ii) holds and denote \(k_0 = \lfloor \frac{1}{r} \rfloor \). Then, for every \(k\in [1,k_0]\), we have \(kr\le 1\) and

$$\begin{aligned} \mathbb {E}Z^{kr} \le (\mathbb {E}Z\bigr )^{kr} \le K_2^{kr}, \end{aligned}$$

while for \(k\ge k_0+1 \), we have \(kr \ge 1\) and, hence, property (ii) yields

$$\begin{aligned} \mathbb {E}Z^{kr} \le K_2^{kr}(kr)^k. \end{aligned}$$

Hence, by Stirling’s formula we have for \(u=(2erK_2^r)^{-1}\),

$$\begin{aligned} \mathbb {E}\exp (uZ^r)&= 1 + \sum _{k=1}^{k_0} \frac{u^k \mathbb {E}Z^{kr}}{k!} + \sum _{k=k_0+1}^\infty \frac{u^k \mathbb {E}Z^{kr}}{k!} \\&\le 1 + \sum _{k=1}^{k_0} \frac{u^k K_2^{kr}}{k!} + \sum _{k=k_0+1}^\infty \frac{u^k K_2^{kr}(kr)^k }{\bigl (k/e\bigr )^k} \\&= 1 + \sum _{k=1}^{k_0} \frac{u^kK_2^{kr}}{k!} + \sum _{k=k_0+1}^\infty 2^{-k} \le e^{uK_2^r} +1. \end{aligned}$$

\(\square \)

The next lemma states that a linear combination of independent \(\psi _r\) random variables is a \(\psi _r\) random variable.

Lemma 2.19

Assume that \(u>0\), \(r\in (0,2]\), and let \((Z_i)_{i\le k}\) be independent symmetric random variables satisfying \(\mathbb {P}(|Z_i|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\). Then for every \(a\in \mathbb {R}^k\) the random variable \(Y{:=}\Vert a\Vert _2^{-1}\sum _{i=1}^k a_iZ_i\) satisfies, for all \(t\ge 0\),

$$\begin{aligned} \mathbb {P}(|Y|\ge t) \le K'e^{-t^r/L'}, \end{aligned}$$

where \(K'\), \(L'\) depend only on K, L, and r.

Proof

The case \(r\ge 1\) is standard (see, e.g., [14, Theorem 1.2.5]), therefore we skip a proof in this case (however, in order to prove the lemma in the case \(r\ge 1\) it suffices to use the result of Gluskin and Kwapień [19] (together with Lemma 2.14) instead of Lemma 2.15 in the proof below).

Assume that \(r\in (0,1]\) and recall that \(Y=\Vert a\Vert _2^{-1}\sum _{i=1}^k a_iZ_i\). By Proposition 2.17,

$$\begin{aligned} \Vert Y\Vert _{\rho {}} \lesssim _{K,L,r} \max \{ \sqrt{\rho {}}, \rho {}^{1/r} \} = \rho {}^{1/r} \qquad \text {for all } \rho {} \ge 1. \end{aligned}$$

Hence, Lemma 2.18 yields the assertion. \(\square \)

Lemma 2.20

Assume that \(r\in (0,2]\), \(\frac{1}{s}{:=}\frac{1}{r}-\frac{1}{2}\), Y is a non-negative random variable such that \(\mathbb {P}(Y\ge t)=e^{-t^s}\) for all \(t\ge 0\), and \(g\sim {\mathcal {N}}(0,1)\) is independent of Y. Then, for every \(t\ge 0\),

$$\begin{aligned} \mathbb {P}\bigl (|g|Y\ge t\bigr )\ge c e^{-4t^r}, \end{aligned}$$

where \(c:=\sqrt{2/\pi }e^{-2}\).

Proof

In the case \(r=2\) we have \(s=\infty \) and then \(Y=1\) almost surely and the assertion is trivial. Assume now that \(r<2\). By our assumptions \(r=\frac{2s}{2+s}\). Let \(x_0{:=}(2t^s)^{1/(2+s)}\). Note that \(x\ge x_0\) is equivalent to \(\frac{t^s}{x^s}\le \frac{x^2}{2}\). Thus,

$$\begin{aligned} \mathbb {P}\bigl (|g|Y\ge t\bigr )&=\mathbb {E}e^{-\frac{t^s}{|g|^s}} = \sqrt{\frac{2}{\pi }} \int _0^\infty e^{-\frac{t^s}{x^s}-\frac{x^2}{2}} dx \ge \sqrt{\frac{2}{\pi }} \int _{x_0}^{x_0+1} e^{-\frac{t^s}{x^s}-\frac{x^2}{2}} dx \\ {}&\ge \sqrt{\frac{2}{\pi }} \int _{x_0}^{x_0+1} e^{-x^2} dx \ge \sqrt{\frac{2}{\pi }} e^{-(x_0+1)^2} \ge \sqrt{\frac{2}{\pi }} e^{-2(x_0^2+1)} \\ {}&= c e^{-2x_0^2} \ge ce^{-4t^{2s/(2+s)}} = ce^{-4t^r}, \end{aligned}$$

where we used \(2^{{2/(2+s)}}\le 2\) and chose \(c{:=}\sqrt{2/\pi }e^{-2}\). \(\square \)

Lemma 2.21

Assume that \(K, L>0\), \(r\in (0,2]\) and that Z is a random variable satisfying \(\mathbb {P}(|Z|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\). Let Y, g, and \(c=\sqrt{2/\pi }e^{-2}\) be as in Lemma  2.20. Then there exist random variables \(U\sim |Z|\) and \(V\sim |g|Y\) such that

$$\begin{aligned} U\le (8L)^{1/r}\Bigl (\Bigl (\frac{\ln (K/c)}{4}\Bigr )^{1/r} +V \Bigr ) \quad \text {a.s.} \end{aligned}$$

Proof

For \(t=0\) we have \(1=\mathbb {P}(|Z|\ge 0)\le K\), so \(K\ge 1\), and thus \(\ln (K/c)=\ln (Ke^2\sqrt{\pi /2})>0\). We use our assumptions, the inequality \((a+b)^r\ge (a^r+b^r)/2\), and Lemma 2.20 to obtain for any \(t\ge 0\),

$$\begin{aligned} \mathbb {P}\Bigl ((8L)^{-1/r}|Z| \ge t+ \bigl (\ln (K/c)/4\bigr )^{1/r} \Bigr )&\le K\exp \Bigl (-8\Bigl [t+ \bigl (\ln (K/c)/4\bigr )^{1/r}\Bigr ]^r\Bigr ) \\ {}&\le K \exp \Bigl (-4\bigl (t^r+ \ln (K/c)/4\bigr )\Bigr ) = ce^{-4t^r} \\ {}&\le \mathbb {P}\bigl (|g|Y\ge t\bigr ). \end{aligned}$$

Consider the version U of |Z| and the version V of |g|Y defined on the (common) probability space (0, 1) equipped with Lebesgue measure, constructed as the (generalised) inverses of cumulative distribution functions of |Z| and |g|Y, respectively. Then \((8L)^{-1/r} U - \bigl (\ln (K/c)/4\bigr )^{1/r} \le V \), which implies the assertion. \(\square \)

Lemma 2.22

Let \(K,L>0\), \(r\in (0,2]\) and \(k\ge 3\), and assume that \((Z_i)_{i\le k}\), are random variables satisfying \(\mathbb {P}(|Z_i|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\). Then

$$\begin{aligned} \mathbb {P}\bigl (\max _{i\le k} |Z_i|\ge (vL \ln k)^{1/r}\bigr )\le K k^{-v+1} \le eKe^{-v} \qquad \text {for every }v\ge 1 \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}\max _{i\le k} |Z_i| \lesssim \bigl ( LK^r r^{-1} \ln k \bigr )^{1/r} \lesssim _{r,K,L} (\ln k)^{1/r}. \end{aligned}$$

Proof

By a union bound and the assumptions we get, for every \(v\ge 1\),

$$\begin{aligned} \mathbb {P}\bigl (\max _{i\le k} |Z_i|\ge (vL \ln k)^{1/r}\bigr )&\le \sum _{i=1}^k \mathbb {P}\bigl ( |Z_i|\ge (vL \ln k)^{1/r}\bigr ) \le k\cdot Ke^{-v\ln k} \\&=Ke^{-(v-1)\ln k} = K k^{-v+1} \le eKe^{-v}, \end{aligned}$$

where we used \(k\ge 3\) in the last step. We integrate by parts, change the variables, and use the above bound to obtain the second part of the assertion, i.e.,

$$\begin{aligned} \mathbb {E}\max _{i\le k} |Z_i|&= \int _0^\infty \mathbb {P}\bigl (\max _{i\le k} |Z_i|\ge u\bigr ) du \le (L \ln k)^{1/r} + \int _{(L \ln k)^{1/r}}^\infty \mathbb {P}\bigl (\max _{i\le k} |Z_i|\ge u\bigr ) du \\&=(L \ln k)^{1/r} + \frac{(L \ln k)^{1/r} }{r} \int _{1}^\infty v^{\frac{1}{r} -1} \mathbb {P}\bigl (\max _{i\le k} |Z_i|\ge (vL \ln k)^{1/r}\bigr ) dv \\&\le (L \ln k)^{1/r} \Bigl (1 + \frac{eK}{r} \int _{1}^\infty v^{\frac{1}{r} -1}e^{-v} dv \Bigr ) \\&\le (L \ln k)^{1/r} \Bigl (1 + eK\ \Gamma \Bigl (\frac{1}{r} +1\Bigr ) \Bigr ). \end{aligned}$$

\(\square \)

3 Proofs of the main results

After the preparation in the previous section, we shall now present the proofs of our main results.

3.1 General bound via Slepian’s lemma

In order to obtain Theorem 1.3 we first prove its weaker version, for \(p=\infty \) and \(q=1\) only. After that we shall use the polytope K from Lemma 2.3 and the Gaussian concentration to see how Proposition  3.1 implies the general bound. The proof of this proposition relies on the symmetrization together with the contraction principle, which allow us to get rid of \(y_i\) and \(x_j\), and make use of Slepian’s lemma.

Proposition 3.1

Assume that \(G=(g_{ij})_{i\le m, j\le n}\) has i.i.d. standard Gaussian entries and \(k \le m\), \(l\le n\). Then

$$\begin{aligned} \mathbb {E}\sup _{I, J} \sup _{y\in B_\infty ^m}\sup _{x\in B_\infty ^n} \sum _{i\in I, j\in J} y_i a_{ij} g_{ij} x_j&\le \bigl (8\sqrt{\ln m} + \sqrt{2/\pi }\bigr ) \sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2} \nonumber \\&\quad + \bigl (8\sqrt{\ln n} +2 \sqrt{2/\pi }\bigr ) \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2 }, \end{aligned}$$

where the suprema are taken over all sets \(I\subset \{1,\ldots ,m\}\), \(J\subset \{1,\ldots ,n\}\) such that \(|I| = k\), \(|J| = l\).

Proof

Throughout the proof, \(k \le m\) and \(l\le n\) are fixed and the suprema are taken over all index sets satisfying \(I\subset \{1,\ldots ,m\}\), \(|I| = k\) and \(J\subset \{1,\ldots ,n\}\), \(|J| = l\).

Let us denote by \(({\widetilde{g}}_{ij})_{i\le m, j\le n}\) an independent copy of \((g_{ij})_{i\le m, j\le n}\). Using the duality \((\ell _1^m)^*=\ell _\infty ^m\), centering the expression, noticing that \(\sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \) is a Gaussian \(\sqrt{\sum _{j\in J} a_{ij}^2 x_j^2}\), and using Jensen’s inequality, we see that

$$\begin{aligned}&\mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sup _{y\in B_\infty ^m} \sum _{i\in I, j\in J} y_i a_{ij} g_{ij} x_j = \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | \nonumber \\&\quad \le \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \Bigl ( \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | - \mathbb {E}\Bigl | \sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \Bigr |\Bigr ) \nonumber \\&\qquad + \sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \mathbb {E}\Bigl | \sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \Bigr | \nonumber \\&\quad = \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \Bigl ( \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | - \mathbb {E}\Bigl | \sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \Bigr |\Bigr ) \nonumber \\&\qquad + \sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2 x_j^2} \mathbb {E}|g|\nonumber \\&\quad \le \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \Bigl ( \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | - \Bigl | \sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \Bigr |\Bigr ) + \sqrt{\frac{2}{\pi }}\sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2 } . \end{aligned}$$
(3.1)

To estimate the expected value on the right-hand side, we use a symmetrization trick together with the contraction principle (Lemma  2.8). Let \((\varepsilon _i)_{i\le m}\) be a sequence of independent Rademacher random variables independent of all others. Since the random vectors

$$\begin{aligned} Z_i = \Bigl ({\textbf{1}}_{\{i\in I\}} \Bigl ( \bigl |\sum _{j\in J} a_{ij}g_{ij}x_j \bigr | - \bigl |\sum _{j\in J} a_{ij}{\widetilde{g}}_{ij}x_j \bigr | \Bigr ) \Bigr )_{I\subset [m], J\subset [n], x\in B_\infty ^n} \end{aligned}$$

(where \(i\le m\)) are independent and symmetric, \((Z_i)_{i\le m}\) has the same distribution as \((\varepsilon _iZ_i)_{i\le m}\). Therefore,

$$\begin{aligned}&\mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \Bigl ( \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | - \Bigl | \sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \Bigr |\Bigr ) \nonumber \\&\quad = \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \varepsilon _i \Bigl ( \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | - \Bigl | \sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \Bigr |\Bigr ) \nonumber \\&\quad \le 2 \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i\in I} \varepsilon _i \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j \Bigr | \nonumber \\&\quad = 2 \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i=1}^m \varepsilon _i \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j {\textbf{1}}_{\{i\in I\}}\Bigr | . \end{aligned}$$
(3.2)

Applying (conditionally, with the values of \(g_{ij}\)’s fixed) the contraction principle (i.e., Lemma 2.8) with the set

$$\begin{aligned} T=\biggl \{\Bigl (\sum _{j\in J} a_{ij} g_{ij} x_j {\textbf{1}}_{\{i\in I\}}\Bigr )_{i\le m} :I\subset [m], |I|=k, J\subset [n], |J|=l, x\in B_\infty ^n \biggr \} \end{aligned}$$

and the function \(u\mapsto |u|\) (which is 1-Lipschitz and takes the value 0 at the origin), we get

$$\begin{aligned}&\mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i=1}^m \varepsilon _i \Bigl | \sum _{j\in J} a_{ij} g_{ij} x_j {\textbf{1}}_{\{i\in I\}}\Bigr | \le \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n} \sum _{i=1}^m \varepsilon _i \sum _{j\in J} a_{ij} g_{ij} x_j {\textbf{1}}_{\{i\in I\}} \nonumber \\&\quad = \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n}\sum _{j\in J} \sum _{i\in I} a_{ij} \varepsilon _i g_{ij} x_j = \mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n}\sum _{j\in J} \sum _{i\in I} a_{ij} g_{ij} x_j . \end{aligned}$$
(3.3)

By proceeding similarly as in (3.1), we obtain

$$\begin{aligned}&\mathbb {E}\sup _{I, J} \sup _{x\in B_\infty ^n}\sum _{j\in J} \sum _{i\in I} a_{ij} g_{ij} x_j = \mathbb {E}\sup _{I, J} \sum _{j\in J} \Bigl | \sum _{i\in I} a_{ij} g_{ij} \Bigr |\nonumber \\&\quad \le \mathbb {E}\sup _{I, J} \sum _{j\in J} \Bigl ( \Bigl | \sum _{i\in I} a_{ij} g_{ij} \Bigr | - \mathbb {E}\Bigl | \sum _{i\in I} a_{ij} {\widetilde{g}}_{ij} \Bigr |\Bigr ) + \sqrt{\frac{2}{\pi }} \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2} . \end{aligned}$$
(3.4)

Observe that using symmetrization and the contraction principle similarly as in (3.2) and (3.3), we can estimate the first summand on right-hand side of (3.4) as follows,

$$\begin{aligned} \mathbb {E}\sup _{I, J} \sum _{j\in J} \Bigl ( \Bigl | \sum _{i\in I} a_{ij} g_{ij} \Bigr | - \mathbb {E}\Bigl | \sum _{i\in I} a_{ij} {\widetilde{g}}_{ij} \Bigr |\Bigr ) \le 2 \mathbb {E}\sup _{I, J} \sum _{i\in I} \sum _{j\in J} a_{ij} g_{ij}. \end{aligned}$$
(3.5)

Altogether, the inequalities in (3.1) – (3.5) yield that

$$\begin{aligned} \mathbb {E}\sup _{I, J} \sup _{y\in B_\infty ^m}\sup _{x\in B_\infty ^n} \sum _{i\in I, j\in J} y_i a_{ij} g_{ij} x_j&\le 4 \mathbb {E}\sup _{I, J} \sum _{i\in I} \sum _{j\in J} a_{ij} g_{ij} + 2 \sqrt{\frac{2}{\pi }} \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2} \nonumber \\&\quad + \sqrt{\frac{2}{\pi }}\sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2 } . \end{aligned}$$
(3.6)

We shall now estimate the first summand on the right-hand side of (3.6) using Slepian’s lemma (i.e., Lemma  2.9). Denote

$$\begin{aligned} X_{I,J}&{:=}\sum _{i\in I} \sum _{j\in J} a_{ij} g_{ij},\\ Y_{I,J}&{:=}\sum _{i\in I} g_i \sqrt{\sum _{j\in J} a_{ij}^2 } + \sum _{j\in J} {\widetilde{g}}_j \sqrt{\sum _{i\in I} a_{ij}^2 }, \end{aligned}$$

where \(g_i, i=1,\ldots ,m\), \( {\widetilde{g}}_j, j=1,\ldots ,n\) are independent standard Gaussian variables. The random variables \(X_{I,J}, Y_{I,J}\) clearly have zero mean. Thus, we only need to calculate and compare \(\mathbb {E}(X_{I,J} - X_{{\widetilde{I}}, {\widetilde{J}}})^2\) and \(\mathbb {E}(Y_{I,J} - Y_{{\widetilde{I}}, {\widetilde{J}}})^2\). In the calculations below it will be evident over which sets the index i (resp. j) runs, so in order to shorten the notation and improve readability, we use the notational convention

$$\begin{aligned} \sum _{I} {:=}\sum _{i\in I}, \qquad \sum _{{\widetilde{J}}} {:=}\sum _{j\in {\widetilde{J}}}, \qquad \sum _{I\cap {\widetilde{I}}, J\setminus {\widetilde{J}}}{:=}\sum _{{i\in I\cap {\widetilde{I}}, j\in J\setminus {\widetilde{J}}}}, \qquad \text {etc.} \end{aligned}$$

By independence,

By independence and the inequality \(2\sqrt{ab} \le a+b\) (valid for \(a,b\ge 0\)),

$$\begin{aligned} \mathbb {E}(Y_{I,J} - Y_{{\widetilde{I}}, {\widetilde{J}}})^2&= 2\sum _{I, J} a_{ij}^2 + 2\sum _{{\widetilde{I}}, {\widetilde{J}}} a_{ij}^2\\&\qquad - 2 \sum _{I\cap {\widetilde{I}}} \sqrt{\sum _{ J } a_{ij}^2}\sqrt{\sum _{ {\widetilde{J}} } a_{ij}^2} - 2 \sum _{J\cap {\widetilde{J}}} \sqrt{\sum _{I } a_{ij}^2}\sqrt{\sum _{{\widetilde{I}} } a_{ij}^2}\\&\ge 2\sum _{I,J} a_{ij}^2 + 2\sum _{{\widetilde{I}}, {\widetilde{J}}} a_{ij}^2 - \sum _{I\cap {\widetilde{I}}, J } a_{ij}^2 - \sum _{I\cap {\widetilde{I}},{\widetilde{J}} } a_{ij}^2 - \sum _{I , J\cap {\widetilde{J}}} a_{ij}^2 - \sum _{{\widetilde{I}}, J\cap {\widetilde{J}}} a_{ij}^2\\&= \sum _{I, J} a_{ij}^2 + \sum _{{\widetilde{I}}, {\widetilde{J}}} a_{ij}^2 - \sum _{I\cap {\widetilde{I}}, J } a_{ij}^2 - \sum _{I\cap {\widetilde{I}}, {\widetilde{J}} } a_{ij}^2 + \sum _{I , J\setminus {\widetilde{J}}} a_{ij}^2 + \sum _{{\widetilde{I}}, {\widetilde{J}}\setminus J } a_{ij}^2. \end{aligned}$$

Thus, we clearly have

$$\begin{aligned} \mathbb {E}(X_{I,J} - X_{{\widetilde{I}}, {\widetilde{J}}})^2 \le \mathbb {E}(Y_{I,J} - Y_{{\widetilde{I}}, {\widetilde{J}}})^2 \end{aligned}$$

(cf. Remark 3.2 below). Hence, by Slepian’s lemma (Lemma 2.9) and Lemma 2.10 on the expected maxima of standard Gaussian random variables,

$$\begin{aligned} \mathbb {E}\sup _{I, J} \sum _{i\in I} \sum _{j\in J} a_{ij} g_{ij}&\le \mathbb {E}\sup _{I, J} \Biggl [\sum _{i\in I} g_i \sqrt{\sum _{j\in J} a_{ij}^2 } + \sum _{j\in J} {\widetilde{g}}_j \sqrt{\sum _{i\in I} a_{ij}^2 } \Biggr ]\\&\le \mathbb {E}\sup _{I, J} \sum _{i\in I} g_i \sqrt{\sum _{j\in J} a_{ij}^2} + \mathbb {E}\sup _{I, J} \sum _{j\in J} {\widetilde{g}}_j \sqrt{\sum _{i\in I} a_{ij}^2 } \\&\le \mathbb {E}\sup _{i\le m} |g_i| \sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2} +\mathbb {E}\sup _{j\le n} |{\widetilde{g}}_j | \sup _{I, J}\sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2 } \\&\le 2\sqrt{\ln m} \sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2} +2\sqrt{\ln n} \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2 } . \end{aligned}$$

Recalling the estimate (3.6), we arrive at

$$\begin{aligned} \mathbb {E}\sup _{I, J} \sup _{y\in B_\infty ^m}\sup _{x\in B_\infty ^n} \sum _{i\in I, j\in J} y_i a_{ij} g_{ij} x_j&\le \bigl (8\sqrt{\ln m} + \sqrt{2/\pi }\bigr ) \sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2} \\&\qquad + \bigl (8\sqrt{\ln n} +2 \sqrt{2/\pi }\bigr ) \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2 }, \end{aligned}$$

which completes the proof of Proposition 3.1. \(\square \)

Remark 3.2

In the above proof, we also have

$$\begin{aligned} \mathbb {E}(X_{I,J} - X_{{\widetilde{I}}, {\widetilde{J}}})^2&= \sum _{ I, J} a_{ij}^2 + \sum _{{\widetilde{I}},{\widetilde{J}}} a_{ij}^2 - \sum _{I\cap {\widetilde{I}}, J\cap {\widetilde{J}} } a_{ij}^2 - \sum _{J\cap {\widetilde{J}}, I\cap {\widetilde{I}}} a_{ij}^2\\&\ge \sum _{ I, J} a_{ij}^2 + \sum _{{\widetilde{I}},{\widetilde{J}}} a_{ij}^2 - \sum _{I\cap {\widetilde{I}}} \sqrt{\sum _{J } a_{ij}^2} \sqrt{\sum _{{\widetilde{J}} } a_{ij}^2} - \sum _{J\cap {\widetilde{J}} } \sqrt{\sum _{I} a_{ij}^2} \sqrt{\sum _{ {\widetilde{I}}} a_{ij}^2}\\&= \frac{1}{2} \mathbb {E}(Y_{I,J} - Y_{{\widetilde{I}}, {\widetilde{J}}})^2. \end{aligned}$$

Therefore, by Slepian’s lemma (Lemma 2.9) we may reverse the estimate from the proof as follows:

$$\begin{aligned} \mathbb {E}\sup _{I, J} \sup _{y\in B_\infty ^m}\sup _{x\in B_\infty ^n} \sum _{i\in I, j\in J} y_i a_{ij} g_{ij} x_j \ge \frac{1}{\sqrt{2}} \mathbb {E}\sup _{I, J} \biggl [\sum _{i\in I} g_i \sqrt{\sum _{j\in J} a_{ij}^2 } + \sum _{j\in J} {\widetilde{g}}_j \sqrt{\sum _{i\in I} a_{ij}^2 } \biggr ]. \end{aligned}$$

Proof of Theorem 1.3

Recall that \(\sup _{I_0,J_0}\) stands for the supremum taken over all sets \(I_0\subset [M] {:=}\{1,\ldots , M \} \), \(J_0\subset [N] {:=}\{1,\ldots , N \}\) with \(|I_0|=m\), \(|J_0|=n\). Given such sets \(I_0\), \( {J_0}\), we introduce the sets

$$\begin{aligned} K&= K(I_0) {:=}{\text {conv}}\Bigl \{ \frac{1}{|I|^{1/q^*}}\bigl ( \varepsilon _i {\textbf{1}}_{\{i\in I\}} \bigr )_{i\in I_0} :I\subset I_0, I\ne \emptyset , (\varepsilon _i )_{i\in I_0}\in \{-1,1\}^{I_0} \Bigr \},\\ L&= L(J_0){:=}{\text {conv}}\Bigl \{ \frac{1}{|J|^{1/p}}\bigl ( \eta _j {\textbf{1}}_{\{j\in J\}} \bigr )_{j\in {J_0}} :J\subset {J_0}, J\ne \emptyset , (\eta _j )_{j\in {J_0}}\in \{-1,1\}^ {J_0}\Bigr \}. \end{aligned}$$

Then, by Lemma 2.3, \(B_{q^*}^{I_0} \subset \ln (em)^{1/q} K\) and \(B_{p}^{J_0}\subset \ln (en)^{1/p^*} L\). Therefore,

$$\begin{aligned}&\mathbb {E}\sup _{I_0, J_0} \sup _{x\in B_p^ {J_0}} \sup _{y\in B_{q^*}^ {I_0}} \sum _{i\in I_0} \sum _{j\in J_0} y_i a_{ij} g_{ij} x_j \nonumber \\&\quad \le \ln (em)^{1/q} \ln (en)^{1/p^*} \nonumber \\&\qquad \cdot \mathbb {E}\sup _{I_0, J_0} \sup _{I\subset I_0, J\subset J_0} \sup \Bigl \{ \frac{1}{|I|^{1/q^*}|J|^{1/p}} \sum _{i\in I}\sum _{j\in J} \varepsilon _ia_{ij} g_{ij}\eta _j : \ \varepsilon _i, \eta _j \in \{-1,1 \} \Bigr \} \nonumber \\&\quad = \ln (em)^{1/q} \ln (en)^{1/p^*}\nonumber \\&\qquad \cdot \mathbb {E}\max _{k\le m, l\le n} \frac{1}{k^{1/q^*} l^{1/p}} \sup _{I\subset [M], |I|=k} \sup _{J\subset [N], |J|=l} \sup _{x\in B_\infty ^N} \sup _{y\in B_\infty ^M} \sum _{i\in I}\sum _{j\in J}y_ia_{ij}g_{ij}x_j \nonumber \\&\quad = \ln (em)^{1/q} \ln (en)^{1/p^*} \mathbb {E}\max _{k\le m, l\le n} Z_{k,l}, \end{aligned}$$
(3.7)

where we denoted

$$\begin{aligned} Z_{k,l}{:=}\frac{1}{k^{1/q^*}l^{1/p}} \sup _{I,J} \sup _{x\in B_\infty ^N} \sup _{y\in B_\infty ^M} \sum _{i\in I}\sum _{j\in J} y_ia_{ij} g_{ij}x_j, \end{aligned}$$

with the suprema here (and later on in this proof) being always taken over all sets \(I\subset [M], |I|=k\) and \(J\subset [N], |J|=l\).

By Proposition 3.1, we only know that for all \(k\le m\) and \(l\le n\),

$$\begin{aligned} \mathbb {E}Z_{k,l}&\le \bigl (8\sqrt{\ln M} + \sqrt{2/\pi }\bigr ) \frac{1}{k^{1/q^*}l^{1/p}} \sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2} \nonumber \\&\quad + \bigl (8\sqrt{\ln N} +2 \sqrt{2/\pi }\bigr ) \frac{1}{k^{1/q^*}l^{1/p}} \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2 }, \end{aligned}$$
(3.8)

but we shall use the Gaussian concentration and the union bound to obtain an estimate for \(\mathbb {E}\max _{k\le m, l\le n} Z_{k,l}.\)

Note first that \((k^{-1/q^*} {\textbf{1}}_{\{i\in I\}})_{i\in I_0}\in K(I_0)\subset B_{q^*}^{I_0}\) and \((l^{-1/p} {\textbf{1}}_{\{j\in J\}})_{j\in J_0}\in L(J_0)\subset B_{p}^{J_0}\), provided that \(|I|=k\), \(|J|=l\), \(I\subset I_0\), \(J\subset J_0\). Therefore,

$$\begin{aligned} \frac{1}{k^{1/q^*}l^{1/p}} \sup _{I, J} \sum _{i\in I} \sqrt{\sum _{j\in J} a_{ij}^2}&\le \sup _{I_0, J_0} \sup _{x\in B_{p}^{J_0}} \sup _{y\in B_{q^*}^{I_0}} \sum _{i\in I_0} y_i\sqrt{\sum _{j\in J_0} a_{ij}^2x_j^2} \\&= \sup _{I_0, J_0} \sup _{z\in B_{p/2}^{J_0}} \Bigl (\sum _{i\in I_0} \bigl ( \sum _{j\in J_0} a_{ij}^2z_j\bigr )^{q/2}\Bigr )^{1/q} \\ {}&= \sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2} \end{aligned}$$

and, similarly,

$$\begin{aligned} \frac{1}{k^{1/q^*}l^{1/p}} \sup _{I, J} \sum _{j\in J} \sqrt{\sum _{i\in I} a_{ij}^2 } \le \sup _{I_0, J_0} \Vert (A\mathbin {\circ }A)^T :\ell ^{I_0}_{q^*/2} \rightarrow \ell ^{J_0}_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

This together with the estimate in (3.8) gives

$$\begin{aligned} \mathbb {E}Z_{k,l}&\le \bigl (8\sqrt{\ln M} + \sqrt{2/\pi }\bigr )\sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2} \nonumber \\&\quad + \bigl (8\sqrt{\ln N} +2 \sqrt{2/\pi }\bigr ) \sup _{I_0, J_0} \Vert (A\mathbin {\circ }A)^T :\ell ^{I_0}_{q^*/2} \rightarrow \ell ^{J_0}_{p^*/2}\Vert ^{1/2} . \end{aligned}$$
(3.9)

Note that by the Cauchy–Schwarz inequality, the function

$$\begin{aligned} z\mapsto \frac{1}{k^{1/q^*}l^{1/p}} \sup _{I,J} \sup _{x\in B_\infty ^N} \sup _{y\in B_\infty ^M} \sum _{i\in I}\sum _{j\in J} y_ia_{ij}z_{ij}x_j \end{aligned}$$

is D-Lipschitz with

$$\begin{aligned} D\le \frac{1}{k^{1/q^*}l^{1/p}} \sup _{I, J} \sqrt{ \sum _{j\in J} \sum _{i\in I} a_{ij}^2 }&\le \sup _{I, J} \sqrt{ \sup _{x\in B_{p/2}^N}\sup _{y\in B_{q^*/2}^M} \sum _{i\in I}\sum _{j\in J} y_i a_{ij}^2 x_j} \\&\le \sup _{I_0, J_0} \sqrt{ \sup _{x\in B_{p/2}^N}\sup _{y\in B_{q^*/2}^M} \sum _{i\in I_0}\sum _{j\in J_0} y_i a_{ij}^2 x_j}, \end{aligned}$$

where in the last inequality we used the fact that \(k\le m\) and \(l\le n\). In order to estimate the right-hand side of the latter inequality, we consider the following two cases:

Case 1. If \(q^*\ge 2\), then \((q^*/2)^* = q/(2-q)\ge q/2\) and \(\Vert \cdot \Vert _{q/(2-q)} \le \Vert \cdot \Vert _{q/2}\). Consequently,

$$\begin{aligned} \sup _{x\in B_{p/2}^N, y\in B_{q^*/2}^M} \sum _{i\in I_0} \sum _{j\in J_0} y_i a_{ij}^2 x_j&= \Vert A \mathbin {\circ }A:\ell _{p/2}^{J_0}\rightarrow \ell _{q/(2-q)}^{I_0}\Vert \nonumber \\ {}&\le \Vert A \mathbin {\circ }A:\ell _{p/2}^{J_0}\rightarrow \ell _{q/2}^{I_0}\Vert . \end{aligned}$$
(3.10)

Case 2. If \(q^*\le 2\), then \(B_{q^*/2}^M \subset B_1^M\) and \(\Vert \cdot \Vert _{\infty } \le \Vert \cdot \Vert _{q/2}\). Thus,

$$\begin{aligned} \nonumber \sup _{x\in B_{p/2}^N, y\in B_{q^*/2}^M} \sum _{i\in I_0}\sum _{j\in J_0} y_i a_{ij}^2 x_j&\le \sup _{u\in B_{p/2}^N, v\in B_{1}^M} \sum _{i\in I_0} \sum _{j\in J_0} v_i a_{ij}^2 u_j \nonumber \\&= \Vert A \mathbin {\circ }A:\ell _{p/2}^{J_0}\rightarrow \ell _{\infty }^{I_0}\Vert \le \Vert A \mathbin {\circ }A:\ell _{p/2}^{J_0} \rightarrow \ell _{q/2}^{I_0}\Vert .\nonumber \\ \end{aligned}$$
(3.11)

In both cases we have

$$\begin{aligned} D\le \sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2}, \end{aligned}$$

so the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]) implies that for all \(u\ge 0\), \(k\le m\), and \(l \le n\),

$$\begin{aligned} \mathbb {P}(Z_{k,l} \ge \mathbb {E}Z_{k,l} +u)\le \exp \Bigl (-\frac{u^2}{2 \sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert }\Bigr ), \end{aligned}$$

so

$$\begin{aligned}{} & {} \mathbb {P}\bigl (Z_{k,l} \ge \max _{k\le m, l\le n} \mathbb {E}Z_{k,l} + \sqrt{2\ln (mn)} u \sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2}\bigr )\\{} & {} \quad \le \exp (-u^2\ln (mn)). \end{aligned}$$

This, together with the union bound, implies that for \(u\ge \sqrt{2}\), we have

$$\begin{aligned}{} & {} \mathbb {P}\bigl (\max _{k\le m, l\le n} Z_{k,l} \ge \max _{k\le m, l\le n}\mathbb {E}Z_{k,l} + \sqrt{2\ln (mn)} u \sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2}\bigr ) \\{} & {} \quad \le mne^{-u^2\ln (mn)} = \exp \Bigl (- (u^2-1)\ln (mn) \Bigr ) \le e^{-u^2/2}. \end{aligned}$$

Hence, by Lemma 2.6 and the estimate in (3.9),

$$\begin{aligned} \mathbb {E}\! \max _{k\le m, l\le n} Z_{k,l}&\le \max _{k\le m, l\le n}\mathbb {E}Z_{k,l} \\ {}&\quad + \sqrt{2\ln (mn)}\Bigl (\sqrt{2} + \frac{1}{e\sqrt{2}}\Bigr ) \sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2}\\&\le \bigl (2.4\sqrt{\ln (mn)} +8\sqrt{\ln M} + \sqrt{2/\pi }\bigr )\sup _{I_0, J_0} \Vert A\mathbin {\circ }A :\ell ^{J_0}_{p/2} \rightarrow \ell ^{I_0}_{q/2}\Vert ^{1/2}\\&\quad + \bigl (8\sqrt{\ln N} +2 \sqrt{2/\pi }\bigr ) \sup _{I_0, J_0} \Vert (A\mathbin {\circ }A)^T :\ell ^{I_0}_{q^*/2} \rightarrow \ell ^{J_0}_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

Recalling (3.7) yields the assertion. \(\square \)

3.2 Coupling

In this subsection we use contraction principles and the coupling described in Lemma 2.21 to prove Corollaries 1.5 and 1.6, and Proposition 1.16. Below we state more general versions of the corollaries akin to Theorem 1.3 (the versions from the introduction follow by setting \(M=m\), \(N=n\)).

Theorem 3.3

(General version of Corollary 1.5) Assume that \(m\le M\), \(n\le N\), \(1\le p,q \le \infty \), and \(X=(X_{ij})_{i\le M, j\le N}\) has independent mean-zero entries taking values in \([-1,1]\). Then

$$\begin{aligned}{} & {} \mathbb {E}\sup _{I, J} \Vert X_A:\ell _p^{J} \rightarrow \ell _q^{I} \Vert =\mathbb {E}\sup _{I, J} \sup _{x\in B_p^ {J}} \sup _{y\in B_{q^*}^ {I}} \sum _{i\in I} \sum _{j\in J} y_i a_{ij}X_{ij}x_j \\{} & {} \quad \le \ln (en)^{1/p^*} \ln (em)^{1/q} \\ {}{} & {} \qquad \cdot \Bigl [ \bigl (2.4\sqrt{2\pi } \sqrt{\ln (mn)}+8\sqrt{2\pi }\sqrt{\ln M}+2\bigr ) \sup _{I,J} \Vert A\mathbin {\circ }A :\ell ^ {J}_{p/2} \rightarrow \ell ^{ I}_{q/2}\Vert ^{1/2}\\{} & {} \qquad +\bigl (8\sqrt{2\pi }\sqrt{\ln N} +4\bigr ) \sup _{I,J} \Vert (A\mathbin {\circ }A)^T :\ell ^{ I}_{q^*/2} \rightarrow \ell ^ {J}_{p^*/2}\Vert ^{1/2} \Bigr ], \end{aligned}$$

where the suprema are taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\).

Remark 3.4

(Symmetrization of entries of a random matrix) Let \({\widetilde{Z}}\) be an independent copy of a random matrix Z with mean 0 entries. Then for any norm \(\Vert \cdot \Vert \), including the operator norm from \(\ell _p^n\) to \(\ell _q^m\), we have by Jensen’s inequality

$$\begin{aligned} \mathbb {E}\Vert Z\Vert = \mathbb {E}\Vert Z-\mathbb {E}{\widetilde{Z}}\Vert \le \mathbb {E}\Vert Z-{\widetilde{Z}}\Vert \le \mathbb {E}\Vert Z\Vert +\mathbb {E}\Vert {\widetilde{Z}}\Vert =2\mathbb {E}\Vert Z\Vert . \end{aligned}$$

Therefore, in many cases we may simply assume that we deal with matrices with symmetric (not only mean 0) entries. For example, in the setting of Theorem 3.3, the entries of \(X-{\widetilde{X}}\) are symmetric and take values in \([-2,2]\), so it suffices to prove the assertion of this theorem (with a two times smaller constant on the right-hand side) under the additional assumption that the entries of the given random matrix are symmetric.

Proof of Theorem 3.3

By Remark 3.4 we may and do assume that the entries of X are symmetric—in this case we need to prove the assertion with a two times smaller constant.

Since the entries of X are independent and symmetric, X has the same distribution as \((\varepsilon _{ij}|X_{ij}|)_{i,j}\), where \((\varepsilon _{ij})_{i\le M, j\le N}\) is a random matrix with i.i.d. Rademacher entries, independent of all other random variables. Thus, the contraction principle (see Lemma  2.7) applied conditionally yields (below the suprema are taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\), and over all \(x\in B_p^ {J}, y\in B_{q^*}^ {I}\), and the sums run over all \(i\in I\) and \(j\in J\))

$$\begin{aligned} \mathbb {E}\sup \sum _{I,J} y_ia_{ij} X_{ij} x_j&= \mathbb {E}\sup \sum _{I,J} y_ia_{ij} \varepsilon _{ij}\bigl |X_{ij} \bigr | x_j \le \mathbb {E}\sup \sum _{I,J} y_ia_{ij} \varepsilon _{ij}x_j \\&= \sqrt{\frac{\pi }{2}} \mathbb {E}\sup \sum _{I,J} y_ia_{ij} \varepsilon _{ij} \mathbb {E}|g_{ij}|x_j \le \sqrt{\frac{\pi }{2}} \mathbb {E}\sup \sum _{I,J} y_ia_{ij} \varepsilon _{ij} |g_{ij}|x_j \\&= \sqrt{\frac{\pi }{2}} \mathbb {E}\sup \sum _{I,J} y_ia_{ij} g_{ij}x_j, \end{aligned}$$

and the assertion follows from Theorem 1.3. \(\square \)

Theorem 3.5

(General version of Corollary 1.6) Assume that \(K,L >0\), \(r\in (0,2]\), \(m\le M\), \(n\le N\), \(1\le p,q \le \infty \), and \(X=(X_{ij})_{i\le M, j\le N}\) has independent mean-zero entries satisfying

$$\begin{aligned} \mathbb {P}(|X_{ij}|\ge t) \le Ke^{-t^r/L} \qquad \text {for all } t\ge 0,\, i\le M,\,j\le N. \end{aligned}$$

Then

$$\begin{aligned}{} & {} \mathbb {E}\sup _{I, J} \Vert X_A:\ell _p^{J} \rightarrow \ell _q^{I} \Vert =\mathbb {E}\sup _{I, J} \sup _{x\in B_p^ {J}} \sup _{y\in B_{q^*}^ {I}} \sum _{i\in I} \sum _{j\in J} y_i a_{ij}X_{ij}x_j \\{} & {} \quad \lesssim _{r,K,L} (\ln n)^{1/p^*} (\ln m)^{1/q} \ln (MN)^{\frac{1}{r} - \frac{1}{2}} \\{} & {} \qquad \cdot \Bigl [ \bigl ( \sqrt{\ln (mn)}+\sqrt{\ln M}\bigr ) \sup _{I,J} \Vert A\mathbin {\circ }A :\ell ^ {J}_{p/2} \rightarrow \ell ^{ I}_{q/2}\Vert ^{1/2}\\{} & {} \qquad +\sqrt{\ln N} \sup _{I,J} \Vert (A\mathbin {\circ }A)^T :\ell ^{ I}_{q^*/2} \rightarrow \ell ^ {J}_{p^*/2}\Vert ^{1/2} \Bigr ], \end{aligned}$$

where the suprema are taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\).

Proof

Let \({\widetilde{X}}\) be an independent copy of X. Then

$$\begin{aligned} \mathbb {P}(|X_{ij}-{\widetilde{X}}_{ij}|\ge t)&\le \mathbb {P}(|X_{ij}|\ge t/2 \text { or } |{\widetilde{X}}_{ij}|\ge t/2) \\&\le 2 \mathbb {P}(|X_{ij}|\ge t/2) \le 2Ke^{-t^r/(2^rL)}. \end{aligned}$$

This means that the symmetric matrix \(X-{\widetilde{X}}\) satisfies the assumptions of Theorem 3.5. Hence, due to Remark 3.4, we may and do assume that the entries of X are symmetric.

Take the unique positive parameter s satisfying \(\frac{1}{r} = \frac{1}{2}+\frac{1}{s}\). For \(i\le M\), \(j\le N\), let \(g_{ij}\) be i.i.d. standard Gaussian variables, independent of other variables, and let \(Y_{ij}\) be i.i.d. non-negative Weibull random variables with shape parameter s scale parameter 1 (i.e., \(\mathbb {P}(Y_{ij}\ge t)=e^{-t^s}\) for \(t\ge 0\)), independent of other variables. (In the case \(r=2\), we have \(s=\infty \) and then \(Y_{ij}=1\) almost surely.) Take

$$\begin{aligned} (U_{ij})_{i\le M, j\le N} \overset{d}{\sim }(|X_{ij}|)_{i\le M, j\le N}, \qquad (V_{ij})_{i\le M, j\le N}\overset{d}{\sim }( |g_{ij}|Y_{ij})_{i\le M, j\le N} \end{aligned}$$

as in Lemma 2.21 (we pick a pair \((U_{ij}, V_{ij})\) separately for every (ij), and then take such a version of each pair that the system of MN random pairs \((U_{ij}, V_{ij})\) is independent).

Let \((\varepsilon _{ij})_{i\le M, j\le N}\) be a random matrix with i.i.d. Rademacher entries, independent of all other random variables. Since the entries of X are symmetric and independent, X has the same distribution as \((\varepsilon _{ij}|X_{ij}|)_{ij}\). By Lemma 2.21 we know that

$$\begin{aligned} U_{ij}\le (8L)^{1/r}\Bigl (\Bigl (\frac{\ln (K/c)}{4}\Bigr )^{1/r} +V_{ij} \Bigr ) \lesssim _{r,K,L} 1+V_{ij} \qquad \text {a.s.} \end{aligned}$$

We use the contraction principle conditionally for \(\mathbb {E}_{\varepsilon }\), i.e., for \(U_{ij}\)’s and \(V_{ij}\)’s fixed. More precisely, we apply Lemma 2.7 to the space \(\textbf{X}\) of all \(M\times N\) matrices with real coefficients, equipped with the norm

$$\begin{aligned} \Vert (M_{ij})_{i\le M,j\le N}\Vert {:=}\sup _{I,J} \bigl \Vert (M_{ij})_{i\in I,j\in J}:\ell _p^I \rightarrow \ell _q^J \bigr \Vert = \sup \sum _{I,J} y_iM_{ij}x_j \end{aligned}$$

(where the first supremum is taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\); recall that the second supremum is taken over all sets IJ as in the first supremum, and over all \(x\in B_p^ {J}, y\in B_{q^*}^ {I}\), and the sum runs over all \(i\in I\) and \(j\in J\)); note that we identify \(\textbf{X}\) with \(\mathbb {R}^{MN}\) (and MN plays the role of n from Lemma 2.7). We apply the contraction principle of Lemma 2.7 (conditionally, with the values of \(U_{ij}\)’s and \(V_{ij}\)’s fixed) with coefficients \(\alpha _{ij}{:=}\frac{U_{ij}}{C(r,K,L)(1+V_{ij})}\) and points \(\textbf{x}_{ij} {:=}\bigl (a_{kl}C(r,K,L)(1+V_{kl}){\textbf{1}}_{\{(k,l)=(i,j)\}}\bigr )_{kl} \in \textbf{X}\) to get

(3.12)

We may estimate the first term using Theorem 3.3 applied to the matrix \((\varepsilon _{ij})_{i\le M, j\le N}\) as follows,

$$\begin{aligned} \mathbb {E}\sup \sum _{I,J} y_ia_{ij} \varepsilon _{ij}x_j&\lesssim \ln (en)^{1/p^*} \ln (em)^{1/q}\nonumber \\&\qquad \cdot \Bigl [ \bigl ( \sqrt{\ln (mn)}+\sqrt{\ln M}\bigr ) \sup _{I,J} \Vert A\mathbin {\circ }A :\ell ^ {J}_{p/2} \rightarrow \ell ^{ I}_{q/2}\Vert ^{1/2} \nonumber \\&\qquad +\sqrt{\ln N} \sup _{I,J} \Vert (A\mathbin {\circ }A)^T :\ell ^{ I}_{q^*/2} \rightarrow \ell ^ {J}_{p^*/2}\Vert ^{1/2} \Bigr ]. \end{aligned}$$
(3.13)

Recall that \((\varepsilon _{ij} V_{ij})_{i\le M, j\le N}\overset{d}{\sim }(\varepsilon _{ij}g_{ij}Y_{ij})_{i\le M, j\le N}\) and that \(Y_{ij}\ge 0\) almost surely. Next we again use the contraction principle (applied conditionally for \(\mathbb {E}_\varepsilon \), i.e. for fixed \(Y_{ij}\)’s and \(g_{ij}\)’s) and get

$$\begin{aligned} \mathbb {E}\sup \sum _{I,J} y_i a_{ij} \varepsilon _{ij}V_{ij}x_j&= \mathbb {E}\sup \sum _{I,J} y_i a_{ij} \varepsilon _{ij}g_{ij} Y_{ij}x_j \nonumber \\&\le \mathbb {E}_Y\max _{i\le M, j\le N} |Y_{ij}| \ \mathbb {E}_{\varepsilon ,g} \sup \sum _{I,J} y_i a_{ij} \varepsilon _{ij}g_{ij} x_j. \end{aligned}$$
(3.14)

Moreover, Theorem 1.3 and Lemma 2.22 (applied with \(r=s\), \(k=MN\), \(Z_{ij}=Y_{ij}\), and \(K=1=L\)), imply

$$\begin{aligned}&\mathbb {E}_Y\max _{i\le M, j\le N} |Y_{ij}| \ \mathbb {E}_{\varepsilon ,g} \sup \sum _{I,J} y_i a_{ij} \varepsilon _{ij}g_{ij} x_j \nonumber \\&\quad \lesssim _{r} \ln (MN)^{1/s} \ \mathbb {E}\sup \sum _{I,J} y_i a_{ij} g_{ij} x_j \nonumber \\&\quad \lesssim \ln (MN)^{\frac{1}{r} -\frac{1}{2}} (\ln n)^{1/p^*} (\ln m)^{1/q} \nonumber \\&\qquad \cdot \Bigl [ \bigl ( \sqrt{\ln (mn)}+\sqrt{\ln M}\bigr ) \sup _{I,J} \Vert A\mathbin {\circ }A :\ell ^ {J}_{p/2} \rightarrow \ell ^{ I}_{q/2}\Vert ^{1/2} \nonumber \\&\qquad +\sqrt{\ln N} \sup _{I,J} \Vert (A\mathbin {\circ }A)^T :\ell ^{ I}_{q^*/2} \rightarrow \ell ^ {J}_{p^*/2}\Vert ^{1/2} \Bigr ]. \end{aligned}$$
(3.15)

Combining the estimates in (3.12)–(3.15) yields the assertion. \(\square \)

Finally, we prove that these estimates of the operator norms translate into tail bounds.

Proof of Proposition 1.16

Since (1.23) implies (1.24) (by Lemma 2.18), it suffices to prove inequality (1.23). By the symmetrization argument similar to the one from the first paragraph of the proof of Theorem 3.5, we may nad will assume that X has independent and symmetric entries satisfying (1.21). By assumption (1.21), and the inequality \(2(a+b)^r\ge a^r+b^r\) we have for every \(t\ge 0\),

$$\begin{aligned}{} & {} \mathbb {P}\bigl ((2L)^{-1/r}|X_{ij}| \ge t+ (\ln K)^{1/r}\bigr ) \\{} & {} \quad \le K\exp \Bigl ( -2\bigl ( t+ (\ln K)^{1/r} \bigr )^r \Bigr )\le K \exp \bigl ( -t^r -\ln K \bigr ) = e^{-t^r}, \end{aligned}$$

so (as in the proof of Lemma 2.21) there exists a random matrix \((Y_{ij})_{i\le m, j\le n}\) with i.i.d. entries with the symmetric Weibull distribution with shape parameter r and scale parameter 1 (i.e., \(\mathbb {P}( |Y_{ij}| \ge t) = e^{-t^r}\) for \(t\ge 0\)) satisfying

$$\begin{aligned} |X_{ij}| \le (2L)^{1/r} \bigl ( (\ln K)^{1/r} +|Y_{ij}| \bigr ) \lesssim _{r,K,L} 1+Y_{ij} \qquad \text {a.s.} \end{aligned}$$
(3.16)

Let \((\varepsilon _{ij})_{i\le m, j\le n}\) be a matrix of independent Rademacher random variables independent of all others, and let \(\Vert \cdot \Vert \) denote the operator norm from \(\ell _{p}^n\) to \(\ell _q^m\). Let \(E_{ij}\) be a matrix with 1 at the intersection of ith row and jth column and with other entries 0. The contraction principle (i.e., Lemma 2.7) applied conditionally, (3.16), and the triangle inequality yield for any \(\rho \ge 1\),

$$\begin{aligned}&\biggl ( \mathbb {E}\Bigl \Vert \sum _{i=1}^m \sum _{j=1}^n X_{ij} a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho } \le \biggl ( \mathbb {E}\Bigl \Vert \sum _{i,j} \varepsilon _{ij}|X_{ij}| a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho } \\&\quad \lesssim _{r,K,L} \biggl ( \mathbb {E}\Bigl \Vert \sum _{i,j} \varepsilon _{ij} a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho } + \biggl ( \mathbb {E}\Bigl \Vert \sum _{i,j} \varepsilon _{ij}|Y_{ij}| a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho } \\&\quad = \biggl ( \mathbb {E}\Bigl \Vert \sum _{i,j} \varepsilon _{ij} a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho } + \biggl ( \mathbb {E}\Bigl \Vert \sum _{i,j} Y_{ij} a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho }. \end{aligned}$$

Therefore, it suffices to prove (1.23) for random matrices \((Y_{ij})_{ij}\) and \((\varepsilon _{ij})_{ij}\) instead of X.

Since by assumption \(K,L\ge 1\), both random matrices \((Y_{ij})_{ij}\) and \((\varepsilon _{ij})_{ij}\) satisfy (1.21), so for them inequality (1.22) holds. By the comparison of weak and strong moments [38, Theorem 1.1] (note that the random variables \(Y_{ij}\) satisfy the assumption \(\Vert Y_{ij}\Vert _{2\,s}\le \alpha \Vert Y_{ij}\Vert _s\) for all \(s\ge 2\) with \(\alpha =2^{1/r}\) by [38, Remark 1.5]), we have

$$\begin{aligned}{} & {} \biggl ( \mathbb {E}\Bigl \Vert \sum _{i,j} Y_{ij} a_{ij}E_{ij} \Bigr \Vert ^\rho \biggr )^{1/\rho } = \biggl ( \mathbb {E}\sup _{x\in B_p^n,\ y\in B_{q^*}^m} \Bigl |\sum _{i,j} y_iY_{ij} a_{ij}x_{j} \Bigr |^\rho \biggr )^{1/\rho } \nonumber \\{} & {} \quad \lesssim _r \mathbb {E}\sup _{x\in B_p^n,\ y\in B_{q^*}^m} \sum _{i,j} y_iY_{ij} a_{ij}x_{j} + \sup _{x\in B_p^n,\ y\in B_{q^*}^m} \biggl (\mathbb {E}\Bigl | \sum _{i,j} y_iY_{ij} a_{ij}x_{j} \Bigr |^{\rho } \biggr )^{1/\rho }. \end{aligned}$$
(3.17)

Because of inequality (1.22), the first summand on the right-hand side may be estimated by \(\gamma D\). Lemma 2.19 and the implication (i)\(\implies \)(ii) from Lemma 2.18 yield

$$\begin{aligned} \biggl (\mathbb {E}\Bigl | \sum _{i,j} y_iY_{ij} a_{ij}x_{j} \Bigr |^{\rho } \biggr )^{1/\rho } \lesssim _{r,K,L} \ \rho ^{1/r} \sqrt{ \sum _{i,j} y_i^2a_{ij}^2x_{j}^2}. \end{aligned}$$

Moreover, by (3.10) and (3.11) (used with \(m=M\) and \(n=N\)) and our assumption that \(\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} \le D\),

$$\begin{aligned} \sup _{x\in B_p^n,\ y\in B_{q^*}^m} \sqrt{ \sum _{i,j} y_i^2a_{ij}^2x_{j}^2} \le D, \end{aligned}$$

so the second summand on the right-hand side of (3.17) is bounded above (up to a multiplicative constant depending only on r, K, and L) by \(\rho ^{1/r}D\). Thus, (1.23) indeed holds for the random matrix \((Y_{ij})_{ij}\) instead of X. A similar reasoning shows that the same inequality holds also for the random matrix \(( \varepsilon _{ij})_{ij}\) (one may also simply use the Khintchine–Kahane inequality and assumption (1.22)). \(\square \)

4 Proofs of further results

4.1 Gaussian random variables

Proof of Proposition 1.7

Fix \(1\le p\le 2\) and \(1\le q\le \infty \). Let K be the set defined in Lemma 2.3 for which \(B_p^n\subset \ln (en)^{1/p^*} K\). Then

$$\begin{aligned} \Vert G_A :\ell ^n_p\rightarrow \ell ^m_q\Vert = \sup _{x\in B_p^n} \Vert G_A x\Vert _q \le \ln (en)^{1/p^*}\! \sup _{x\in {\text {Ext}}(K)}\! \Vert G_A x\Vert _q, \end{aligned}$$
(4.1)

where \({\text {Ext}}(K)\) is the set of extreme points of K. We shall now estimate the expected value of the right-hand side of (4.1).

To this end, we first consider a fixed \(x=(x_j)_{j=1}^n \in {\text {Ext}}(K)\). Then there exists a non-empty index set \(J\subset \{1,\dots ,n\}\) of cardinality \(k\le n\) such that \(x_j = \frac{\pm 1}{k^{1/p}}\) for \(j\in J\) and \(x_j=0\) for \(j\notin J\). We have

$$\begin{aligned} \Vert G_A x\Vert _q = \Bigl \Vert \Bigl ( \sum _{j=1}^n a_{ij}g_{ij} x_j \Bigr )_{i=1}^m \Bigr \Vert _q = \Bigl ( \sum _{i=1}^m \Bigl | \sum _{j=1}^n a_{ij}g_{ij} x_j\Bigr |^q\Bigr )^{1/q}. \end{aligned}$$
(4.2)

Let us estimate the Lipschitz constant of the function

$$\begin{aligned} z=(z_{ij})_{ij} \mapsto \Bigl \Vert \Bigl ( \sum _{j=1}^n a_{ij}z_{ij} x_j \Bigr )_{i=1}^m \Bigr \Vert _q = \sup _{y\in B_{q^*}^m}\sum _{i=1}^m \sum _{j=1}^n y_ia_{ij}z_{ij}x_j. \end{aligned}$$
(4.3)

It follows from the Cauchy–Schwarz inequality (used in \(\mathbb {R}^{m\times n}\)) that

$$\begin{aligned} \sup _{y\in B_{q^*}^m} \sum _{i=1}^m \sum _{j=1}^n y_ia_{ij}z_{ij}x_j&\le \Vert z\Vert _2 \sqrt{\sup _{y\in B_{q^*}^m} \sum _{i=1}^m \sum _{j=1}^n y_{i}^2a_{ij}^2x_j^2} \nonumber \\&=\Vert z\Vert _2 \frac{1}{k^{1/p}}\sqrt{ \sup _{y\in B^m_{q^*/2}} \sum _{i=1}^m \sum _{j\in J} y_i a_{ij}^2 } = \Vert z\Vert _2\frac{ b_J}{k^{1/p}}, \end{aligned}$$
(4.4)

where we put

$$\begin{aligned} b_J{:=}\sqrt{ \sup _{y\in B^m_{q^*/2}} \sum _{i=1}^m \sum _{j\in J} y_i a_{ij}^2 }. \end{aligned}$$

This shows that the function defined by (4.3) is \(\frac{ b_J}{k^{1/p}}\)-Lipschitz continuous. Therefore, by the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]), for any \(u\ge 0\),

$$\begin{aligned} \mathbb {P}\bigl (\Vert G_A x\Vert _q \ge \mathbb {E}\Vert G_A x\Vert _q + u\bigr ) \le \exp \bigl ( - \frac{k^{2/p} u^2}{2b_J^2}\bigr ). \end{aligned}$$
(4.5)

We shall transform this inequality into a form which is more convenient to work with. We want to estimate \(\mathbb {E}\Vert G_A x\Vert _q \) independently of x and get rid of the dependence on J and p on the right-hand side. By (4.2) and the fact that \(x\in {\text {Ext}}(K)\subset B_p^n\), we obtain

$$\begin{aligned} \mathbb {E}\Vert G_A x\Vert _q&\le (\mathbb {E}\Vert G_A x\Vert _q^q)^{1/q} = \gamma _q \Bigl ( \sum _{i=1}^m \Bigl |\sum _{j=1}^n a_{ij}^2 x_j^2\Bigr |^{q/2}\Bigr )^{1/q}\\&\le \gamma _q \sup _{z\in B_p^n} \Bigl ( \sum _{i=1}^m \Bigl | \sum _{j=1}^n a_{ij}^2 z_j^2\Bigr |^{q/2}\Bigr )^{1/q} =\gamma _q \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} {=:}a. \end{aligned}$$

We use the definition of \(b_J\), then interchange the sums, use the triangle inequality, and then the inequality between the arithmetic mean and the power mean of order \(p^*/2\ge 1\) (recall that \(|J|=k\) and \(p\le 2\)) to obtain

$$\begin{aligned} k^{2/p^*-1} b_J^2&= k^{2/p^*-1} \sup _{y\in B^m_{q^*/2}} \sum _{i=1}^m \sum _{j\in J} a_{ij}^2 y_i = k^{2/p^*-1} \sup _{y\in B^m_{q^*/2}} \sum _{j\in J} \Bigl | \sum _{i=1}^m a_{ij}^2 y_i \Bigr | \nonumber \\&\le \sup _{y\in B^m_{q^*/2}} \Bigl ( \sum _{j\in J} \Bigl | \sum _{i=1}^m a_{ij}^2 y_i \Bigr | ^{p^*/2}\Bigr )^{2/p^*} \le \sup _{y\in B^m_{q^*/2}} \Bigl ( \sum _{j=1}^n\Bigl | \sum _{i=1}^m a_{ij}^2 y_i \Bigr | ^{p^*/2}\Bigr )^{2/p^*} \nonumber \\&= \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert {=:}b^2. \end{aligned}$$
(4.6)

The two inequalities above, together with inequality (4.5) (applied with \(u=k^{1/p^* -1/2} b_J \sqrt{2\ln (en)} s\)), imply that

$$\begin{aligned} \mathbb {P}\bigl (\Vert G_A x\Vert _q \ge a + b \sqrt{2\ln (en)}\, s \bigr )&\le \exp \bigl ( - k^{2/p+2/p^*-1} \ln (en)s^2\bigr ) \nonumber \\&= \exp \bigl ( - k \ln (en)s^2\bigr ) \end{aligned}$$
(4.7)

holds for any \(s\ge 0\) and all \(x\in {\text {Ext}}(K)\) with support of cardinality k.

For any \(k\le n\), there are \(2^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \le 2^k n^k \le \exp ( k \ln (en))\) vectors in \({\text {Ext}}(K)\) with support of cardinality k. Therefore, using a union bound together with (4.7), we see that, for all \(s\ge \sqrt{2}\),

$$\begin{aligned}{} & {} \mathbb {P}\bigl (\sup _{x\in {\text {Ext}}K}\! \Vert G_A x\Vert _q \ge a + b \sqrt{2\ln (en)} s \bigr )\le \sum _{k=1}^n \exp ( -k \ln (en)(s^2-1))\\{} & {} \quad \le n \exp ( - \ln (en)(s^2-1)) = n(en)^{-s^2+1}\le e^{-s^2+1}. \end{aligned}$$

Hence, by Lemma 2.6 (applied with \(s_0{:=}\sqrt{2}\), \(\alpha {:=}e\), \(\beta {:=}1\), and \(r{:=}2\)),

$$\begin{aligned} \mathbb {E}\! \sup _{x\in {\text {Ext}}K}\! \Vert G_A x\Vert _q&\le a + b \sqrt{2\ln (en)} \Bigl (\sqrt{2} + e \frac{e^{-2}}{2\sqrt{2}}\Bigr ) \le a + 2.2 b \sqrt{\ln (en)}. \end{aligned}$$

Recalling (4.1) and the definitions of a and b yields the assertion. \(\square \)

We now turn to the special case \(q=1\).

Proof of Proposition 1.8

Since the first part of this proof works for general \(q\ge 1\), we do not restrict our attention to \(q=1\) for now. First of all,

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _{p}^n \rightarrow \ell _q^m \Vert \le \bigl (\mathbb {E}\Vert G_A :\ell _{p}^n \rightarrow \ell _q^m \Vert ^q\bigr )^{1/q} = \Bigl ( \mathbb {E}\sup _{x\in B_{p}^n} \sum _{i=1}^m |\langle X_i,x \rangle |^q\Bigr )^{1/q}, \end{aligned}$$

where \(X_i=(a_{ij}g_{ij})_{j=1}^n\) is the i-th row of the matrix \(G_A\). Centering this expression gives

$$\begin{aligned} \mathbb {E}\sup _{x\in B_{p}^n} \sum _{i=1}^m |\langle X_i,x \rangle |^q&\le \mathbb {E}\sup _{x\in B_{p}^n} \Big [\sum _{i=1}^m |\langle X_i,x \rangle |^q - \mathbb {E}|\langle X_i,x \rangle |^q \Big ]\nonumber \\&\quad + \sup _{x\in B_{p}^n} \sum _{i=1}^m \mathbb {E}|\langle X_i, x \rangle |^q. \end{aligned}$$
(4.8)

We first take care of the second term on the right-hand side of (4.8). We have

$$\begin{aligned} \sup _{x\in B_{p}^n} \sum _{i=1}^m \mathbb {E}|\langle X_i, x \rangle |^q&= \gamma _q^q \sup _{x\in B_{p}^n} \sum _{i=1}^m \Bigl (\sum _{j=1}^na_{ij}^2x_j^2\Bigr )^{q/2} \nonumber \\&= \gamma _q^q \sup _{z\in B_{p/2}^n} \Big \Vert \Bigl (\sum _{j=1}^na_{ij}^2z_j\Bigr )_{i\le m} \Big \Vert _{q/2}^{q/2} = \gamma _q^q \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{q/2}. \end{aligned}$$
(4.9)

In order to deal with the first term on the right-hand side of (4.8), we use a symmetrization trick together with the contraction principle. The latter is the reason that we need to work with \(q=1\) here. We start with the symmetrization. Denoting by \({\widetilde{X}}_1,\dots ,{\widetilde{X}}_n\) independent copies of \(X_1, \dots , X_n\) and by \((\varepsilon _i)_{i=1}^m\) a sequence of Rademacher random variables independent of all others, we obtain by Jensen’s and the triangle inequalities that

$$\begin{aligned}&\mathbb {E}\sup _{x\in B_{p}^n} \Bigl [\sum _{i=1}^m |\langle X_i,x \rangle |^q - \mathbb {E}|\langle X_i,x \rangle |^q \Bigr ] = \mathbb {E}\sup _{x\in B_{p}^n} \Big [\sum _{i=1}^m |\langle X_i,x \rangle |^q - \mathbb {E}|\langle {\widetilde{X}}_i,x \rangle |^q \Bigr ]\nonumber \\&\quad \le \mathbb {E}\sup _{x\in B_{p}^n} \Bigl [\sum _{i=1}^m |\langle X_i,x \rangle |^q - |\langle {\widetilde{X}}_i,x \rangle |^q \Bigr ] \nonumber \\&\quad = \mathbb {E}\sup _{x\in B_{p}^n} \Bigl [\sum _{i=1}^m \varepsilon _i( |\langle X_i,x \rangle |^q - |\langle {\widetilde{X}}_i,x \rangle |^q) \Bigr ] \le 2 \cdot \mathbb {E}\sup _{x\in B_{p}^n} \sum _{i=1}^m \varepsilon _i |\langle X_i,x \rangle |^q. \end{aligned}$$
(4.10)

If \(q=1\), we may use the contraction principle (i.e., Lemma 2.8 applied with functions \(\varphi _i(t)=|t|\)) conditionally to obtain

$$\begin{aligned} \mathbb {E}\sup _{x\in B_{p}^n} \sum _{i=1}^m \varepsilon _i |\langle X_i,x \rangle |&\le \mathbb {E}\sup _{x\in B_{p}^n} \sum _{i=1}^m \varepsilon _i\langle X_i,x \rangle \nonumber \\&= \mathbb {E}\sup _{x\in B_{p}^n} \sum _{j=1}^n x_j \sum _{i=1}^m a_{ij}\cdot \varepsilon _i g_{ij} = \mathbb {E}\sup _{x\in B_{p}^n} \sum _{j=1}^n x_j \sum _{i=1}^m a_{ij} g_{ij}. \end{aligned}$$
(4.11)

For \(p > 1\), we have

$$\begin{aligned} \mathbb {E}\sup _{x\in B_{p}^n} \sum _{j=1}^n x_j \sum _{i=1}^m a_{ij} g_{ij}&=\mathbb {E}\Bigl ( \sum _{j=1}^n\Big |\sum _{i=1}^m a_{ij}g_{ij}\Big |^{p^*}\Bigr )^{1/p^*} \nonumber \\&\le \Bigl ( \sum _{j=1}^n \mathbb {E}\Big |\sum _{i=1}^m a_{ij}g_{ij}\Big |^{p^*}\Bigr )^{1/p^*} = \gamma _{p^*} \Bigl (\sum _{j=1}^n \Bigl (\sum _{i=1}^m a_{ij}^2\Bigr )^{p^*/2}\Bigr )^{1/p^*}. \end{aligned}$$
(4.12)

Moreover, we have

$$\begin{aligned} \Bigl (\sum _{j=1}^n \Bigl (\sum _{i=1}^m a_{ij}^2\Bigr )^{p^*/2}\Bigr )^{1/p^*}&= \sup _{\delta \in \{-1,1 \}^m} \Bigl \Vert \Bigl (\sum _{i=1}^ma_{ij}^2\delta _i\Bigr )_{j\le n} \Bigr \Vert _{p^*/2}^{1/2} \nonumber \\&= \bigl \Vert (A\mathbin {\circ }A)^T:\ell _\infty ^m \rightarrow \ell _{p^*/2}^n \Bigr \Vert ^{1/2}. \end{aligned}$$
(4.13)

Inequalities (4.10)–(4.13) give the estimate of the first term on the right-hand side of (4.8). This ends the proof of the upper bound for \(p > 1\).

If \(p = 1\), then letting \(g_1,\ldots ,g_n\) be i.i.d. standard Gaussian random variables, we have

$$\begin{aligned} \mathbb {E}\sup _{x\in B_{p}^n} \sum _{j=1}^n x_j \sum _{i=1}^m a_{ij} g_{ij}&= \mathbb {E}\max _{j\le n} \Big |\sum _{i=1}^m a_{ij} g_{ij}\Big | \nonumber \\&= \mathbb {E}\max _{j\le n} g_j b_j \asymp \max _{j\le n} (\sqrt{\ln (j+1)}b_j^{\downarrow {}}), \end{aligned}$$
(4.14)

where the last step follows from Lemmas 2.11 and 2.12 with \(b_j {:=}\Vert (a_{ij})_{i\le m}\Vert _2\), \(j\le n\). Putting together (4.8)–(4.11) and (4.14) completes the proof of the upper bound in the case \(p=1\).

The lower bound in the case \(p>1\) follows from Proposition 5.1 and Corollary 5.2 below. In the case \(p=1\), we use Proposition 5.1, note that

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _{p}^n \rightarrow \ell _1^m \Vert \ge \mathbb {E}\sup _{x\in B_{p}^n} \sum _{j=1}^n x_j \sum _{i=1}^m a_{ij} g_{ij}, \end{aligned}$$

and use (4.14) to obtain a lower bound. \(\square \)

Now we deal with another special case, the one where \(p=1\).

Proof of Proposition 1.10

Recall that we deal with the range \(p=1\le q\le 2\). Using the structure of extreme points of \(B_1^n\) we get

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _1^n \rightarrow \ell _q^m\Vert = \mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_{i\le m}\Vert _q. \end{aligned}$$

Denote \(Z_j = \Vert (a_{ij}g_{ij})_{i\le m}\Vert _q\). By well-known tail estimates of norms of Gaussian variables with values in Banach spaces (see, e.g., [36, Corollary 1] for a more general formulation) we get for all \(t > 0\),

$$\begin{aligned} \mathbb {P}\bigl ( Z_j \ge C(\mathbb {E}Z_j + \sqrt{t} b_j)\bigr )&\le e^{-t}, \end{aligned}$$
(4.15)
$$\begin{aligned} \mathbb {P}\bigl ( Z_j \ge c(\mathbb {E}Z_j + \sqrt{t} b_j)\bigr )&\ge \min (c,e^{-t}), \end{aligned}$$
(4.16)

where cC are universal positive constants, and

$$\begin{aligned} b_j^2 = \Vert (a_{ij}^2)_{i\le m}\Vert _{q/(2-q)} =\Vert (a_{ij}^2)_{i\le m}\Vert _{(q^*/2)^*} = \sup _{x\in B_{q^*}^m} \sum _{i=1}^m a_{ij}^2x_i^2. \end{aligned}$$

Inequality (4.15) shows in particular that the random variables \((Z_j - C \mathbb {E}Z_j)_+\) satisfy

$$\begin{aligned} \mathbb {P}((Z_j - C \mathbb {E}Z_j)_+ \ge t) \le \exp \Bigl (-\frac{t^2}{C^2 b_j^2}\Bigr ) \end{aligned}$$

for all \(t > 0\), thus by Lemma 2.11 we get

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _1^n \rightarrow \ell _q^m\Vert = \mathbb {E}\max _{j\le n} Z_j&\le C \max _{j\le n} \mathbb {E}Z_j + \max _{j\le n} (Z_j-C\mathbb {E}Z_j)_+ \\&\lesssim \Big (\max _{j\le n} \mathbb {E}Z_j + \max _{j\le n} (\sqrt{\ln (j+1)}b_j^{\downarrow {}})\Big ), \end{aligned}$$

which together with the observation (following from Lemma 2.1 and the fact that \(1=p\le q\le 2\)) that

$$\begin{aligned} \mathbb {E}Z_j \le \Big (\mathbb {E}\sum _{i=1}^m |a_{ij}|^q |g_{ij}|^q\Big )^{1/q} = \gamma _q \Vert (a_{ij})_{i\le m}\Vert _q = \gamma _q \Vert A\mathbin {\circ }A:\ell _{1/2}^n \rightarrow \ell _{q/2}^m\Vert ^{1/2}, \end{aligned}$$

proves the upper estimate of the proposition.

Using comparison of moments of norms of Gaussian random vectors, we also get

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _1^n \rightarrow \ell _q^m\Vert&\ge \max _{j\le n} \mathbb {E}Z_j > rsim \max _{j\le n} (\mathbb {E}Z_j^q)^{1/q} \nonumber \\&= \gamma _q \Vert (a_{ij})_{i\le m}\Vert _q = \gamma _q \Vert A\mathbin {\circ }A:\ell _{1/2}^n \rightarrow \ell _{q/2}^m\Vert ^{1/2}, \end{aligned}$$
(4.17)

so to end the proof it is enough to show that

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _1^n \rightarrow \ell _q^m\Vert \ge \max _{j\le n}(\sqrt{\ln (j+1)}b_j^{\downarrow {}}). \end{aligned}$$
(4.18)

This will follow by a straightforward adaptation of the argument from the proof of Lemma 2.12. We may and do assume that the sequence \((b_j)_{j\le n}\) is non-increasing in j. By (4.16) we have for any \(j\le n\) and \(k \ge 1\),

$$\begin{aligned} \mathbb {P}(Z_j \ge c\sqrt{\ln (k+1)}b_j) \ge \frac{c'}{k}. \end{aligned}$$

Thus, since \(b_j\ge b_k\) for all \(j\le k\), we have for any \(k\le n\),

$$\begin{aligned} \mathbb {P}(\max _{j\le n} Z_j \ge \sqrt{\ln (k+1)}b_k )&\ge \mathbb {P}(\exists _{j\le k} \ Z_j \ge \sqrt{\ln (k+1)}b_j)\\&\ge 1 - (1-c'/k)^k \ge 1- e^{-c'} > 0. \end{aligned}$$

Thus,

$$\begin{aligned} \mathbb {E}\Vert G_A :\ell _1^n \rightarrow \ell _q^m\Vert = \mathbb {E}\max _{j\le n} Z_j > rsim \sqrt{\ln (k+1)} b_k. \end{aligned}$$

Taking maximum over \(k \le n\) gives (4.18) and ends the proof. \(\square \)

4.2 Bounded random variables

Here we show how one can adapt the methods of [9] to prove Proposition 1.14, i.e., a version of Corollary 1.13 in the special case of bounded random variables with better logarithmic terms and with explicit numerical constants. Following [9], we start with a lemma.

Lemma 4.1

Assume that X is as in Proposition 1.14. Let \((b_j)_{j\le n}\in \mathbb {R}^n\) and suppose that \(t_0\) is such that \(\bigl |\sum _{j=1}^n b_j X_{ij}\bigr |\le t_0\) almost surely. Then, for all \(q\ge 2\) and \(0\le t\le {t_0^{2-q}}(4 \sum _{j=1}^n b_j^2)^{-1} \),

$$\begin{aligned} \mathbb {E}\exp \bigl (t\bigl |\sum _{j=1}^n b_j X_{ij}\bigr |^q \bigr ) \le 1 + C^q(q) \, t \, \bigl (\sum _{j=1}^n b_j^2 \bigr )^{q/2}, \end{aligned}$$
(4.19)

where \(C(q) {:=}2(q \Gamma (q/2))^{1/q} \asymp \sqrt{q} \).

Proof

Without loss of generality we may and do assume that \(\sum _{j=1}^n b_j^2=1\).

Since \(q\ge 2\), for \(s\in [0,t_0]\) and \(t\in [0,\frac{1}{4} t_0^{2-q}]\) we have \(ts^q - s^2/2 \le - s^2/4\). Thus, integration by parts, our assumption \(0\le \bigl |\sum _{j=1}^n b_j X_{ij}\bigr |\le t_0\) a.s., and Hoeffding’s inequality (i.e., Lemma 2.13) yield

$$\begin{aligned} \mathbb {E}\exp \bigl (t\bigl |\sum _{j=1}^n b_j X_{ij}\bigr |^q \bigr )&= 1 + qt \int _0^{t_0} s^{q-1} \exp (ts^q) \mathbb {P}\bigl (\bigl |\sum _{j=1}^n b_j X_{ij}\bigr |\ge s \bigr ) ds\\&\le 1 + 2qt \int _0^{t_0} s^{q-1} \exp (ts^q - s^2/2) ds\\&\le 1 + 2qt \int _0^{\infty } s^{q-1} \exp (- s^2/4) ds\\&= 1 + t 2^q q \Gamma (q/2). \end{aligned}$$

\(\square \)

Proof of Proposition 1.14

We start with a bunch of reductions. Set

$$\begin{aligned} a&{:=}\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} = \max _{j\le n} \Vert (a_{ij})_{i=1}^m\Vert _q, \\ b&{:=}\Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} = \max _{i\le m} \Vert (a_{ij})_{j=1}^n\Vert _{p^*}. \end{aligned}$$

(The equalities follow from Lemma 2.1, since \(p/2\le 1\le q/2\) and \( q^*/2 \le 1\le p^*/2\)). Let K be the set defined in Lemma 2.3, so that \(B_p^n\subset \ln (en)^{1/p^*} K\). Then

$$\begin{aligned} \Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert = \sup _{x\in B_p^n} \Vert X_A x\Vert _q \le \ln (en)^{1/p^*}\! \sup _{x\in {\text {Ext}}(K)}\! \Vert X_A x\Vert _q, \end{aligned}$$
(4.20)

where \({\text {Ext}}(K)\) is the set of extreme points of K.

Consider first a fixed \(x=(x_j)_{j=1}^n \in {\text {Ext}}(K) \subset B_p^n\). We have

$$\begin{aligned} \Vert X_A x\Vert _q^q =\sum _{i=1}^m \Bigl | \sum _{j=1}^n a_{ij}X_{ij} x_j\Bigr |^q. \end{aligned}$$
(4.21)

Denote

$$\begin{aligned} t_0&{:=}b = \max _{i\le m} \Vert (a_{ij})_{j=1}^n\Vert _{p^*},\\ t&{:=}\frac{t_0^{2-q}}{4 \max _{i\le m} \Vert (a_{ij}x_j)_{j=1}^n\Vert _{2}^2}. \end{aligned}$$

Then, by the boundedness of \(X_{ij}\) and by Hölder’s inequality, for every \(i\le m\),

$$\begin{aligned} \bigl |\sum _{j=1}^n a_{ij}x_j X_{ij}\bigr | \le \sum _{j=1}^n |a_{ij}| |x_j| \le \Vert (a_{ij})_{j=1}^n\Vert _{p^*} \Vert (x_{j})_{j=1}^n\Vert _{p} \le t_0. \end{aligned}$$

We can now apply, for every \(i\le m\), Lemma 4.1 (with t and \(t_0\) as above and with coefficients \(b_{ j} = a_{ij}x_j\)). Since the random variables \(\bigl |\sum _{j=1}^n a_{ij}x_j X_{ij}\bigr | \), \(i\le m\), are independent, using Lemma 4.1 yields

$$\begin{aligned} \mathbb {E}\exp \bigl (t \sum _{i=1}^m \bigl |\sum _{j=1}^n a_{ij}x_j X_{ij}\bigr |^q \bigr )&= \prod _{i=1}^m \Bigl [ \mathbb {E}\exp \bigl (t \bigl |\sum _{j=1}^n a_{ij}x_j X_{ij}\bigr |^q \bigr )\Bigr ] \\&\le \prod _{i=1}^m \Bigl (1 + C^q(q) \, t \, \bigl (\sum _{j=1}^n a_{ij}^2x_j^2 \bigr )^{q/2}\Bigr ) \\&\le \exp \Bigl ( C^q(q)\, t\sum _{i=1}^m \bigl (\sum _{j=1}^n a_{ij}^2x_j^2 \bigr )^{q/2}\Bigr ) \le \exp \bigl ( C^q(q)\, t a^q\bigr ), \end{aligned}$$

where in the last step we used the definition of \(a =\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\) (and the fact that \(x\in B^n_{p}\)). By Chebyshev’s inequality and (4.21), we have, for every \(s\ge 0\),

$$\begin{aligned} \mathbb {P}\bigl (t \Vert X_{A} x\Vert _q^q \ge \ln \bigl [ \mathbb {E}\exp \bigl (t \sum _{i=1}^m \bigl |\sum _{j=1}^n a_{ij}x_j X_{ij}\bigr |^q \bigr )\bigr ] +sk \bigr ) \le e^{-sk}. \end{aligned}$$

Combining this with the previous estimate yields, for every \(s\ge 0\),

$$\begin{aligned} \mathbb {P}\bigl (\Vert X_{A} x\Vert _q^q \ge C^q(q)\, a^q + \frac{ sk}{t} \bigr ) \le e^{-sk}. \end{aligned}$$

Recall that \(x\in {\text {Ext}}(K)\). Thus, there exists an index set \(J\subset \{1,\dots ,n\}\) of cardinality \(k\le n\), such that \(x_j = \frac{\pm 1}{k^{1/p}}\) for \(j\in J\) and \(x_j=0\) for \(j\notin J\). We use the definition of t and the inequality between the arithmetic mean and the power mean of order \(p^*/2\ge 1\) (recall that \(|J|=k\) and \(p\le 2\)) to get

$$\begin{aligned} \frac{1}{4 t} = b^{q-2} \max _{i\le m} \Vert (a_{ij}x_j)_{j=1}^n\Vert _{2}^2&= b^{q-2} k^{-2/p} \max _{i\le m} \sum _{j\in J} a_{ij}^2\\&\le b^{q-2} k^{-2/p + 1 - 2/p^*} \max _{i\le m} \bigl (\sum _{j\in J} |a_{ij}|^{p^*}\bigr )^{2/p^*} = b^q k^{-1}. \end{aligned}$$

Putting everything together, we obtain

$$\begin{aligned} \mathbb {P}\bigl (\Vert X_{A} x\Vert _q^q \ge C^q(q)\, a^q + 4 b^q s \bigr ) \le e^{-sk} \end{aligned}$$
(4.22)

for all \(s\ge 0\) and all \(x\in {\text {Ext}}(K)\) with support of cardinality k.

For any \(k\le n\), there are \(2^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \le 2^k n^k \le \exp ( k \ln (en))\) vectors in \({\text {Ext}}(K)\) with support of cardinality k. Thus, using the union bound and (4.22), we see that, for all \(s\ge 2\),

$$\begin{aligned}{} & {} \mathbb {P}\bigl (\sup _{x\in {\text {Ext}}K}\!\Vert X_A x\Vert _q^q \ge C^q(q)\, a^q + 4 b^q \ln (en) s \bigr ) \le \sum _{k=1}^n \exp ( -k \ln (en)(s-1))\\{} & {} \quad \le n \exp ( - \ln (en)(s-1)) = n(en)^{-s+1}\le e^{-s+1}. \end{aligned}$$

Hence, by Lemma 2.6,

$$\begin{aligned} \mathbb {E}\!\! \sup _{x\in {\text {Ext}}K}\!\!\ \Vert X_{A} x\Vert _q \le \bigl (\mathbb {E}\!\! \sup _{x\in {\text {Ext}}K}\!\! \Vert X_{A} x\Vert _q^q\bigr )^{1/q}&\le \bigl ( C^q(q)\, a^q + 4 b^q \ln (en) (2 + e\cdot e^{-2}) \bigr )^{1/q}\\&\le C(q) a + 10^{1/q}\ln (en)^{1/q}b. \end{aligned}$$

Recalling (4.20) and the definitions of a, b, and C(q) yields the assertion. \(\square \)

Remark 4.2

In the unstructured case, for \(X_{ij}\) which are independent, mean-zero, and take values in \([-1,1]\), it is easy to extend (1.2) to the whole range of \(p, q\in [1,\infty ]\) (see [8, 13]). Indeed, for \(p\ge 2\) and \(q\ge 2\),

$$\begin{aligned} \mathbb {E}\Vert X:\ell ^n_p \rightarrow \ell ^m_q\Vert&\le \Vert \ell ^n_p \hookrightarrow \ell ^n_{2}\Vert \cdot \mathbb {E}\Vert X:\ell ^n_2 \rightarrow \ell ^m_q\Vert \\&\lesssim _q n^{1/2-1/p} \cdot \max \{n^{1/2}, m^{1/q}\} = \max \{n^{1-1/p}, n^{1/2-1/p}m^{1/q}\} . \end{aligned}$$

Thus, for \(p\ge 2\) and \(1\le q\le 2\),

$$\begin{aligned} \mathbb {E}\Vert X:\ell ^n_p \rightarrow \ell ^m_q\Vert&\le \mathbb {E}\Vert X:\ell ^n_p \rightarrow \ell ^m_2\Vert \cdot \Vert \ell ^m_{2} \hookrightarrow \ell ^m_q\Vert \\&\lesssim _q \max \{n^{1-1/p}, n^{1/2-1/p}m^{1/2}\} \cdot m^{1/q- 1/2}\\&= \max \{n^{1-1/p}m^{1/q-1/2}, n^{1/2-1/p}m^{1/q}\}. \end{aligned}$$

Suppose now that \(1\le p \le 2\le q \le \infty \) and \(1/p+1/q\le 1\) (i.e., \(q\ge p^*\)). Choose \(\theta \in [0,1]\) and \(r \ge 2\) so that \(\frac{1}{p} = \frac{\theta }{2} + \frac{1-\theta }{1}\) and \(\frac{1}{q} = \frac{\theta }{r} + \frac{1-\theta }{\infty }\), i.e., \(\theta = 2/p^*\) and \(r = 2q/p^*\). Using the Riesz–Thorin interpolation theorem, the fact that \(\Vert X:\ell ^n_1 \rightarrow \ell ^m_\infty \Vert \le 1\) (since the entries take values in \([-1,1]\)), and Jensen’s inequality, we arrive at

$$\begin{aligned} \mathbb {E}\Vert X:\ell ^n_p \rightarrow \ell ^m_q\Vert&\le \mathbb {E}\Vert X:\ell ^n_2 \rightarrow \ell ^m_r\Vert ^\theta \Vert X:\ell ^n_1 \rightarrow \ell ^m_\infty \Vert ^{1-\theta }\\&\le \mathbb {E}\Vert X:\ell ^n_2 \rightarrow \ell ^m_r\Vert ^\theta \le \bigl (\mathbb {E}\Vert X:\ell ^n_2 \rightarrow \ell ^m_r\Vert \bigr )^\theta \\&\le \max \{n^{1/2}, m^{1/r}\}^{\theta } = \max \{n^{1/p^*}, m^{1/q}\}. \end{aligned}$$

The estimates in the remaining ranges of pq follow by duality (1.12). Moreover, up to constants, all these estimates are optimal, as they can be reversed for matrices with \(\pm 1\) entries (see [8, Proposition 3.2] or [13, Satz 2]).

4.3 \(\psi _r\) random variables

In this section, we prove Theorem 1.15. To this end we shall split the matrix X into two parts \(X^{(1)}\) and \(X^{(2)}\) such that all entries of \(X^{(1)}\) are bounded by \(C\ln (mn)^{1/r}\). Then, we shall deal with \(X^{(2)}\) using the following crude bound and the fact that the probability that \(X^{(2)} \ne 0\) is very small. In order to bound the expectation of the norm of \(X^{(1)}\) we need a cut-off version of Theorem 1.15 – see Lemma 4.4 below.

Lemma 4.3

Let \(r\in (0,2]\). Assume that \(X = (X_{ij})_{i\le m, j\le n}\) satisfies the assumptions of Theorem 1.15. Then

$$\begin{aligned} \bigl (\mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _q^m\Vert ^2\bigr )^{1/2} \lesssim _{r,K,L} (m+n)^{1/r} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}. \end{aligned}$$

Proof

By a standard volumetric estimate (see, e.g., [64, Corollary 4.2.13]), we know that there exists (in the metric \(\Vert \cdot \Vert _p\)) a 1/2-net S in \(B_p^n\) of size at most \(5^n\). In other words, for any \(x\in B_p^n\) there exists \(y\in S\) such that \(x-y \in \frac{1}{2} B_p^n\). Thus, for any \(z\in \mathbb {R}^n\),

$$\begin{aligned} \sup _{x\in B_p^n} \sum _{j=1}^n x_jz_j&\le \sup _{x\in B_p^n} \min _{y\in S} \sum _{j=1}^n (x_j-y_j) z_j + \sup _{y\in S} \sum _{j=1}^n y_j z_j \\&\le \sup _{u\in \frac{1}{2} B_p^n} \sum _{j=1}^n u_j z_j + \sup _{y\in S} \sum _{j=1}^n y_j z_j = \frac{1}{2}\sup _{x\in B_p^n} \sum _{j=1}^n x_j z_j + \sup _{y\in S} \sum _{j=1}^n y_j z_j . \end{aligned}$$

Hence,

$$\begin{aligned} \sup _{x\in B_p^n} \sum _{j=1}^n x_jz_j \le 2 \sup _{y\in S} \sum _{j=1}^n y_j z_j. \end{aligned}$$
(4.23)

Likewise, if we denote by T the 1/2-net in \(B_{q^*}^m\) (in the metric \(\Vert \cdot \Vert _{q^*}\)) of size at most \(5^m\), then

$$\begin{aligned} \sup _{x\in B_{q^*}^m} \sum _{i=1}^m x_iz_i \le 2 \sup _{y\in T} \sum _{i=1}^m y_i z_i. \end{aligned}$$
(4.24)

Combining these two estimates, we see that

$$\begin{aligned} \bigl (\mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _q^m\Vert ^2\bigr )^{1/2}&= \bigl ( \mathbb {E}\sup _{x\in B_p^n, y\in B_{q^*}^m} \bigl ( \sum _{i=1}^m \sum _{j=1}^n y_i a_{ij} X_{ij} x_j\bigr )^2 \bigr )^{1/2} \nonumber \\&\le 4 \bigl (\mathbb {E}\sup _{x\in S, y\in T} \bigl ( \sum _{i=1}^m \sum _{j=1}^n y_i a_{ij} X_{ij} x_j\bigr )^2\bigr )^{1/2}. \end{aligned}$$
(4.25)

Lemma 2.19 implies that for any \(x\in \mathbb {R}^n\), \(y\in \mathbb {R}^m\), the random variable

$$\begin{aligned} Z(x,y){:=}\bigl (\sum _{i,j} y_i^2a_{ij}^2x_j^2 \bigr )^{-1/2} \sum _{i=1}^m \sum _{j=1}^n y_i a_{ij} X_{ij} x_j \end{aligned}$$

satisfies condition (i) in Lemma 2.18. Thus, Lemma 2.18 implies that

$$\begin{aligned} \mathbb {E}\exp \Bigl ( c(r,K,L) \bigl (\sum _{i,j} y_i^2a_{ij}^2x_j^2 \bigr )^{-r/2} \Bigl ( \sum _{i=1}^m \sum _{j=1}^n y_i a_{ij} X_{ij} x_j\Bigr )^{r}\Bigr ) \le C(r,K,L),\qquad \end{aligned}$$
(4.26)

where \(c(r,K,L) \in (0,\infty )\) and \(C(r,K,L) \in (0,\infty )\) depend only on r, K, and L.

The function \(z\mapsto e^{z^{r/2}}\) is convex on \([(2r^{-1}-1)^{2/r},\infty )\). Therefore, by Jensen’s inequality, for any \(u> 0\) and any nonnegative random variable Z,

$$\begin{aligned} \exp \bigl (u (\mathbb {E}Z^2)^{r/2}\bigr )&\le \exp \bigl ( (u^{2/r}\mathbb {E}Z^2 + (2r^{-1}-1)^{2/r})^{r/2}\bigr )\\&\le \mathbb {E}\exp \bigl ( (u^{2/r} Z^2 + (2r^{-1}-1)^{2/r})^{r/2}\bigr )\\&\le \mathbb {E}\exp \bigl ( u Z^r + (2r^{-1}-1) \bigr ) \le e^{2/r} \mathbb {E}\exp (u Z^r). \end{aligned}$$

Hence,

$$\begin{aligned} (\mathbb {E}Z^2)^{1/2} \le u^{-1/r} \Bigl (\ln \bigl (e^{2/r} \mathbb {E}\exp (u Z^{r})\bigr )\Bigr )^{1/r}. \end{aligned}$$

Thus, when

$$\begin{aligned} u{:=}c(r,K,L) \left( \max _{x\in S, y\in T} \sum _{i,j} y_i^2a_{ij}^2x_j^2 \right) ^{-r/2}, \end{aligned}$$

we get by (4.26), (3.10), and (3.11),

where in the last two inequalities we also used inequalities \(|S| \le 5^n\) and \(|T|\le 5^m\), and the inclusions \(S\subset B_p^n\), \(T\subset B_{q^*}^m\). Recalling (4.25) completes the proof. \(\square \)

The following cut-off version of Theorem 1.15 can be proved similarly as Proposition 1.7.

Lemma 4.4

Let \(K,L, M>0\) and \(r\in (0,2]\). Assume \(X=(X_{ij})_{i\le m, j\le n}\) is a random matrix with independent symmetric entries taking values in \([-M,M]\) and satisfying the condition

$$\begin{aligned} \mathbb {P}\bigl (|X_{ij}| \ge t\bigr ) \le Ke^{-t^r/L} \quad \text {for all } t\ge 0. \end{aligned}$$
(4.27)

Then, for \(1\le p\le 2\) and \(1\le q< \infty \), we have

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&\lesssim q^{1/r} C(r,K,L) \ln (en)^{1/p^*} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\quad + M \ln (en)^{1/2+1/p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

Proof

Fix \(1\le p\le 2\) and \(1\le q\le \infty \). Let K be the set defined in Lemma 2.3 so that \(B_p^n\subset \ln (en)^{1/p^*} K\). Then

$$\begin{aligned} \Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert = \sup _{x\in B_p^n} \Vert X_A x\Vert _q \le \ln (en)^{1/p^*}\! \sup _{x\in {\text {Ext}}(K)}\! \Vert X_A x\Vert _q, \end{aligned}$$
(4.28)

where \({\text {Ext}}(K)\) is the set of extreme points of K. We shall now estimate the expected value of the right-hand side of (4.28).

To this end, we consider a fixed \(x=(x_j)_{j=1}^n \in {\text {Ext}}(K)\). This means that there exists a non-empty index set \(J\subset \{1,\dots ,n\}\) of cardinality \(k\le n\) such that \(x_j = \frac{\pm 1}{k^{1/p}}\) for \(j\in J\) and \(x_j=0\) for \(j\notin J\). We know from (4.4) that the Lipschitz constant of the convex function

$$\begin{aligned} z=(z_{ij})_{ij} \mapsto \Bigl \Vert \Bigl ( \sum _{j=1}^n a_{ij}z_{ij} x_j \Bigr )_i \Bigr \Vert _q = \sup _{y\in B_{q^*}^m}\sum _{i=1}^m \sum _{j=1}^n y_ia_{ij}z_{ij}x_j \end{aligned}$$

is less than or equal to

$$\begin{aligned} \frac{1}{k^{1/p}}\sqrt{ \sup _{y\in B^m_{q^*/2}} \sum _{i=1}^m \sum _{j\in J} y_i a_{ij}^2 }\, {=:}\frac{ b_J}{k^{1/p}}. \end{aligned}$$

Thus, Talagrand’s concentration for convex functions and random vectors with independent bounded coordinates (see [56, Theorem  6.6 and Eq. (6.18)]), together with the inequality \({\text {Med}}(|Z|)\le 2\mathbb {E}|Z|\), implies

$$\begin{aligned} \mathbb {P}(\Vert X_Ax\Vert _q \ge 2\mathbb {E}\Vert X_Ax\Vert _q + t) \le 4\exp \Bigl (-\frac{k^{2/p}t^2}{16M^2b_J^2}\Bigr ) \qquad \text {for all } t\ge 0. \qquad \end{aligned}$$
(4.29)

Similar to the proof in the Gaussian case (i.e., proof of Proposition 1.7), we shall transform this into a more convenient form by getting rid of \(b_J\) and estimating \(\mathbb {E}\Vert X_A x\Vert _q \). Let us denote, for each \(i\in \{1,\dots ,m\}\),

$$\begin{aligned} Z_i&{:=}\sum _{j=1}^n a_{ij}X_{ij} x_j. \end{aligned}$$

From our assumption (4.27) as well as Lemmas 2.19 and 2.18, we obtain that \((\mathbb {E}|Z_i|^q)^{1/q}\lesssim _{r,K,L}q^{1/r}\sqrt{\sum _{j=1}^n a_{ij}^2 x_j^2}\). Hence,

$$\begin{aligned} \mathbb {E}\Vert X_A x\Vert _q\le & {} \bigl (\mathbb {E}\Vert X_Ax\Vert _q^q\bigr )^{1/q} = \bigl (\sum _{i=1}^m \mathbb {E}|(X_Ax)_i|^q\bigr )^{1/q} \lesssim _{r,K,L} q^{1/r} \Bigl ( \sum _{i=1}^m \bigl (\sum _{j=1}^n a_{ij}^2 x_j^2\bigr )^{q/2} \Bigr )^{1/q} \\\le & {} q^{1/r} \sup _{z\in B_p^n} \Bigl ( \sum _{i=1}^m \Bigl | \sum _{j=1}^n a_{ij}^2 z_j^2\Bigr |^{q/2}\Bigr )^{1/q} = q^{1/r} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} {=:}q^{1/r} a. \end{aligned}$$

From (4.6), we see that

$$\begin{aligned} k^{2/p^*-1} b_J^2 \le \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert {=:}b^2. \end{aligned}$$

The above two inequalities together with estimate (4.29) (applied with \(t=4k^{\frac{1}{p^*}-\frac{1}{2}}b_J M\sqrt{\ln (en)} s\)), imply that

$$\begin{aligned} \mathbb {P}\bigl (\Vert X_A x\Vert _q \ge C(r,K,L) q^{1/r}a + 4bM \sqrt{\ln (en)} s \bigr ) \le 4 \exp \bigl ( - k \ln (en)s^2\bigr ) \end{aligned}$$
(4.30)

for every \(s\ge 0\) and any \(x\in {\text {Ext}}(K)\) with support of cardinality k.

For any \(k\le n\), there are \(2^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \le 2^k n^k \le \exp ( k \ln (en))\) vectors in \({\text {Ext}}(K)\) with support of cardinality k. Thus, using the union bound and (4.30), we see that for \(s\ge \sqrt{2}\),

$$\begin{aligned} \mathbb {P}\bigl (\sup _{x\in {\text {Ext}}K}\! \Vert X_A x\Vert _q{} & {} \ge C(r,K,L) q^{1/r}a + 4bM \sqrt{\ln (en)} s\bigr )\\ {}{} & {} \le 4\sum _{k=1}^n \exp ( -k \ln (en)(s^2-1)) \\{} & {} \le 4n\exp ( - \ln (en)(s^2-1)) = 4n(en)^{-s^2+1}\le 4 e^{-s^2+1}. \end{aligned}$$

Hence, by Lemma 2.6,

$$\begin{aligned} \mathbb {E}\! \sup _{x\in {\text {Ext}}K}\! \Vert X_A x\Vert _q&\le C(r,K,L)q^{1/r}a + 4b M \sqrt{\ln (en)} \Bigl (\sqrt{2} +4 e \frac{e^{-2}}{2\sqrt{2}}\Bigr ). \end{aligned}$$

Recalling (4.28) and the definitions of a and b yields the assertion. \(\square \)

Proof of Theorem 1.15

By a symmetrization argument (as in the first paragraph of the proof of Theorem 3.5), we may and do assume that all the entries \(X_{ij}\) are symmetric. Set \(M = (4\,L \ln (mn)/r)^{1/r}\). Denote \({\widehat{X}}_{ij} = X_{ij} {\textbf{1}}_{\{|X_{ij}|\le M\}}\) and let \({\widehat{X}}\) be the \(m\times n\) matrix with entries \({\widehat{X}}_{ij}\). We have

$$\begin{aligned} \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert&= \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|\le M\}} \\&\quad + \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|> M\}}. \end{aligned}$$

The random matrix \({\widehat{X}}\) satisfies the assumptions of Lemma 4.4. Thus, the first summand above can be estimated as follows:

$$\begin{aligned}&\mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|\le M\}}\\&\quad = \mathbb {E}\sup _{y\in B_{q^*}^m, x\in B_p^n} \bigl \{ \sum _{i=1}^m \sum _{j=1}^n y_i a_{ij} X_{ij} x_j \bigr \} \cdot {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|\le M\}} \\&\quad = \mathbb {E}\sup _{y\in B_{q^*}^m, x\in B_p^n}\bigl \{\sum _{i=1}^m \sum _{j=1}^n y_i a_{ij} X_{ij} {\textbf{1}}_{\{|X_{ij}|\le M\}} x_j \bigr \} \cdot {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|\le M\}} \\&\quad = \mathbb {E}\Vert {\widehat{X}}_A :\ell ^n_p\rightarrow \ell ^m_q\Vert {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|\le M\}} \le \mathbb {E}\Vert {\widehat{X}}_A :\ell ^n_p\rightarrow \ell ^m_q\Vert \\&\quad \lesssim _{r, K, L} \ q^{1/r}\ln (en)^{1/p^*} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\qquad \ +M \ln (en)^{1/2+1/p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \\&\quad \lesssim _{r, K, L} \ q^{1/r} \ln (en)^{1/p^*} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\\&\qquad \ + \ln (mn)^{1/r} \ln (en)^{1/2+1/p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

For the second summand we write, using the Cauchy–Schwarz inequality and then Lemmas 4.3 and 2.22 (with \(k=mn\) and \(v=4/r\); recall that \(M = (4\,L \ln (mn)/r)^{1/r}\)),

$$\begin{aligned}&\mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert {\textbf{1}}_{\{ \max _{k,l} |X_{kl}|> M\}}\\&\quad \le \bigl ( \mathbb {E}\Vert X_A :\ell ^n_p\rightarrow \ell ^m_q\Vert ^2\bigr )^{1/2} \mathbb {P}( \max _{k\le m, l\le n} |X_{kl}| > M)^{1/2}\\&\quad \lesssim _{r, K, L} (m+n)^{1/r} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2} \cdot (mn)^{-2/r +1/2}\\&\quad \lesssim _{r} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}. \end{aligned}$$

Combinging the above three inequalities ends the proof. \(\square \)

5 Lower bounds and further discussion of conjectures

5.1 Lower bounds

Let us first provide lower bounds showing that the upper bounds obtained above are indeed sharp (up to logarithms).

Proposition 5.1

Let \(X = (X_{ij})_{i\le m, j\le n}\) be a random matrix with independent mean-zero entries satisfying \(\mathbb {E}|X_{ij}| \ge c\) for some \(c\in (0,\infty )\). Then, for all \(1\le p, q\le \infty \),

$$\begin{aligned} \mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \ge \frac{c}{2\sqrt{2}}\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}. \end{aligned}$$

Using duality (1.12) we immediately obtain the following corollary.

Corollary 5.2

Let \(X = (X_{ij})_{i\le m, j\le n}\) be as in Proposition 5.1. Then, for all \(1\le p,q\le \infty \),

$$\begin{aligned} \mathbb {E}\Vert X_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \ge \frac{c}{2\sqrt{2}}\Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$

Proof of Proposition 5.1

Let \(\Vert \cdot \Vert \) denote the operator norm from \(\ell _{p}^n\) to \(\ell _q^m\). For \(i\in \{1,\dots ,m\}\) and \(j\in \{1,\dots ,n \}\), let us denote by \(E_{ij}\) the \(m\times n\) matrix with entry 1 at the intersection of ith row and jth column and with all other entries 0. By the symmetrization trick described in Remark 3.4, it suffices to consider matrices X with symmetric entries and prove the assertion with a twice better constant \(c/\sqrt{2}\) (note that, also by Remark 3.4, the lower bound for the absolute first moment of the symmetrized entries does not change and is still equal to c).

If X has symmetric independent entries, it has the same distribution as \((\varepsilon _{ij}|X_{ij}|)_{ij}\), where \(\varepsilon _{ij}\), \(i\le m\), \(j\le n\), are i.i.d. Rademacher random variables, independent of all other random variables. Hence, by Jensen’s inequality and the contraction principle (Lemma 2.7 applied with \(\alpha _{ij} = 1/\mathbb {E}|X_{ij}| \le 1/c\) and \(x_{ij}=a_{ij}\mathbb {E}|X_{ij}| E_{ij}\)), we get

$$\begin{aligned} \mathbb {E}\Bigl \Vert \sum _{i=1}^m \sum _{j=1}^n X_{ij} a_{ij}E_{ij} \Bigr \Vert&= \mathbb {E}\Bigl \Vert \sum _{i,j}\varepsilon _{ij}|X_{ij}|a_{ij}E_{ij} \Bigr \Vert \ge \mathbb {E}\Bigl \Vert \sum _{i,j} \varepsilon _{ij}\mathbb {E}|X_{ij}|a_{ij} E_{ij} \Bigr \Vert \nonumber \\&\ge c\ \mathbb {E}\Bigl \Vert \sum _{i,j} \varepsilon _{ij}a_{ij} E_{ij} \Bigr \Vert . \end{aligned}$$
(5.1)

Thus, it suffices to estimate from below \(\ \mathbb {E}\Vert \sum _{i,j} \varepsilon _{ij}a_{ij} E_{ij} \Vert \).

Since the \(\ell _q\) norm is unconditional, we obtain from the inequalities of Jensen and Khintchine (see [26]) that

$$\begin{aligned} \mathbb {E}\Bigl \Vert \sum _{i=1}^m \sum _{j=1}^n \varepsilon _{ij}a_{ij} E_{ij} \Bigr \Vert&= \mathbb {E}\sup _{x\in B_p^n} \Bigl \Vert \bigl ( \sum _{j=1}^n a_{ij}\varepsilon _{ij} x_j\bigr )_{i=1}^m \Bigr \Vert _q = \mathbb {E}\sup _{x\in B_p^n} \Bigl \Vert \Bigl ( \bigl | \sum _{j=1}^n a_{ij}\varepsilon _{ij} x_j \bigr |\Bigr )_{i=1}^m \Bigr \Vert _q \\&\ge \sup _{x\in B_p^n} \Bigl \Vert \Bigl ( \mathbb {E}\bigl | \sum _{j=1}^n a_{ij}\varepsilon _{ij} x_j \bigr |\Bigr )_{i=1}^m \Bigr \Vert _q\\&\mathop {\ge }^{\text {Khintchine's}}_{\text {inequality}} \frac{1}{\sqrt{2}} \sup _{x\in B_p^n} \Bigl \Vert \Bigl ( \bigl ( \sum _{j=1}^n a_{ij}^2 x_j^2 \bigr )^{1/2} \Bigr )_{i=1}^m \Bigr \Vert _q \\&= \frac{1}{\sqrt{2}} \sup _{z\in B_{p/2}^n} \Bigl \Vert \Bigl ( \sum _{j=1}^n a_{ij}^2 z_j \Bigr )_{i=1}^m \Bigr \Vert _{q/2}^{1/2} \\ {}&= \frac{1}{\sqrt{2}} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}. \end{aligned}$$

This together with the estimate in (5.1) yields the assertion. \(\square \)

Since \(\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \ge \max _{i,j} |a_{ij}g_{ij}|\), it suffices to prove the following proposition in order to provide the lower bound in Conjecture 1.

Proposition 5.3

For the \(m\times n\) Gaussian matrix \(G_A\), we have

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert > rsim _{p,q} {\left\{ \begin{array}{ll} \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}} &{} \text {if }\ p\le q\le 2,\\ \max _{i\le m}\sqrt{\ln (i+1)} d_i^{\downarrow {}} &{} \text {if } \ 2\le p \le q, \\ 0 &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$
(5.2)

where \(b_j= \Vert (a_{ij})_{i\le m} \Vert _{2q/(2-q)}\) and \(d_i=\Vert (a_{ij})_{j\le n}\Vert _{2p/(p-2)}\).

Proof

Since \(B_1^n\subset B_p^n\) for \(p\ge 1\) and the \(b_j\)’s do not depend on p, it suffices to prove the first part of the assertion (in the range \(p\le q\le 2\)) only in the case \(p=1\le q\le 2\). In this case (5.2) follows by Propostion 1.10.

The assertion in the range \(2\le p \le q\) follows by duality (1.12). \(\square \)

5.2 The proof of Inequalities (1.13) and (1.11)

Let us now show that in the case \(q<p\), the third term on the right-hand side in Conjecture 1 is not needed. To this end it suffices to prove (1.13) only in the case \(q<2\), since the case \(p>2\) follows by duality (1.12).

Proposition 5.4

Whenever \(1\le q<p \le \infty \) and \(q<2\), we have

$$\begin{aligned} D_2=\Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} > rsim _{p,q} \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}}, \end{aligned}$$
(5.3)

where \(b_j= \Vert (a_{ij})_{i\le m} \Vert _{2q/(2-q)}\).

Proof

Since the right-hand side of (5.3) does not depend on p, and the left-hand side is non-decreasing with p, we may consider only the case \(1\le q<p\le 2\). By permuting the columns of A we may and do assume without loss of generality that the sequence \((b_j)_j\) is non-increasing.

Fix \(j_0\le n\). Let r be the midpoint of the non-empty interval \((\frac{2-p}{p}, \frac{2-q}{q})\). Take \(x=(x_j)_{j\le n}\) with \(x_j=\frac{1}{j^r}\). Since \(rp/(2-p)>1\), we have

$$\begin{aligned} \sum _{j=1}^n x_j^{p/(2-p)} \le \sum _{j=1}^\infty \frac{1}{j^{rp/{(2-p)}}} =C(p,q)<\infty , \end{aligned}$$

so \(x\in C'(p,q)B^n_{p/(2-p)}=C'(p,q)B^n_{(p^*/2)^*}\). Therefore, the inequality \((q^*/2)^*= q/(2~-~q) \ge ~1\) and the facts that \(b_j\ge b_{j_0}\) for all \(j\le j_0\), and that \(r<(2-q)/q\) imply

$$\begin{aligned}&D_2^2 = \sup _{z\in B_{(p^*/2)^*}^n} \biggl ( \sum _{i=1}^m \Bigl ( \sum _{j=1}^n a_{ij}^2 z_j\Bigr )^{(q^*/2)^*} \biggr )^{1/(q^*/2)^*} \\ {}&\qquad > rsim _{p,q} \biggl ( \sum _{i=1}^m \Bigl ( \sum _{j=1}^{j_0} a_{ij}^2 j^{-r}\Bigr )^{q/(2-q)} \biggr )^{(2-q)/q} \\&\qquad \ge \Bigl ( \sum _{i=1}^m \sum _{j=1}^{j_0} a_{ij}^{2q/(2-q)} j^{-{rq/(2-q)}} \Bigr )^{(2-q)/q}\\&\qquad = \Bigl ( \sum _{j=1}^{j_0} b_j^{2q/(q-2)} j^{-{rq/(2-q)}} \Bigr )^{(2-q)/q} \\&\qquad \ge b_{j_0}^2j_0^{-r+(2-q)/q } > rsim _{p,q} b_{j_0}^2 \ln (j_0+1). \end{aligned}$$

Taking the maximum over all \(j_0\le n\) completes the proof. \(\square \)

Now we turn to the proof of (1.11). Note that it suffices to prove only the first two-sided inequality in (1.11), since the second one follows from it by duality (1.12).

Proposition 5.5

For all \(1\le p, q\le \infty \), we have

$$\begin{aligned}{} & {} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}+ \mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}| \nonumber \\{} & {} \quad \asymp _{q} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}+\max _{i\le m, j\le n}\sqrt{\ln (j+1)} a_{ij}', \end{aligned}$$
(5.4)

where the matrix \((a_{ij}')_{i,j}\) is obtained by permuting the columns of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _i a_{i1}'\ge \dots \ge \max _i a_{in}'\).

Proof

By permuting the columns of the matrix A, we can assume that the sequence \((\max _{i\le m} |a_{ij}|)_{j=1}^n\) is non-increasing. We have

$$\begin{aligned} \mathbb {E}\max _{i\le m,j\le n} |a_{ij} g_{ij}|{} & {} \le \mathbb {E}\max _{j\le n} \Big (\max _{i\le m} |a_{ij} g_{ij}| - \mathbb {E}\max _{i\le m} |a_{ij} g_{ij}|\Big )\nonumber \\{} & {} + \max _{j\le n} \mathbb {E}\max _{i\le m}|a_{ij}g_{ij}|. \end{aligned}$$
(5.5)

The function \(y \mapsto \max _{i\le m} |a_{ij} y_i|\) is \(\max _{i\le m} |a_{ij}|\)-Lipschitz with respect to the Euclidean norm on \(\mathbb {R}^m\), so by Gaussian concentration (see, e.g., [41, Chapter 5.1]),

$$\begin{aligned} \mathbb {P}\bigl (\max _{i\le m} |a_{ij} g_{ij}| - \mathbb {E}\max _{i\le m} |a_{ij} g_{ij}| \ge t \bigr ) \le \exp \Bigl (-\frac{t^2}{2\max _{i\le m} |a_{ij}|}\Bigr ) \end{aligned}$$

for all \(t\ge 0\), \(j\le n\). Thus, Lemma 2.11 and inequality (5.5) imply

$$\begin{aligned} \mathbb {E}\max _{i\le m,j\le n} |a_{ij} g_{ij}| \lesssim \max _{j\le n} \Big (\sqrt{\ln (j+1)}\max _{i\le m} |a_{ij}|\Big )+ \max _{j\le n} \mathbb {E}\max _{i\le m}|a_{ij}g_{ij}|. \end{aligned}$$
(5.6)

We have

$$\begin{aligned} \max _{j\le n} \mathbb {E}\max _{i\le m}|a_{ij}g_{ij}|&\le \max _{j\le n} \mathbb {E}\Big (\sum _{i=1}^m |a_{ij} g_{ij}|^q\Big )^{1/q} \le \gamma _q \max _{j\le n} \Vert (a_{ij})_{i}\Vert _q \\&= \gamma _q \max _{j\le n} \Vert (a_{ij}^2)_{i}\Vert _{q/2}^{1/2} \le \gamma _q \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2}, \end{aligned}$$

which, together with (5.6), provides the asserted upper bound.

On the other hand, if \((a_{l}^{\downarrow {}})_{l\le mn}\) denotes the non-increasing rearrangement of the sequence of all absolute values of entries of A, then Lemma 2.12 implies

$$\begin{aligned} \mathbb {E}\max _{j\le n} \max _{i\le m}|a_{ij} g_{ij}|& > rsim \max _{l\le mn} \sqrt{\ln (l+1)}a_l^{\downarrow {}} \ge \max _{j\le n} \sqrt{\ln (j+1)} a_{j}^{\downarrow {}} \\&\ge \max _{j\le n} \Big (\sqrt{\ln (j+1)}\max _{i\le m} a_{ij}'\Big ), \end{aligned}$$

which provides the asserted lower bound. \(\square \)

Note that the above proof shows in fact that

$$\begin{aligned}{} & {} \max _{j\le n} \Vert (a_{ij})_{i}\Vert _q + \mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}| \\{} & {} \quad \asymp _{q} \max _{j\le n} \Vert (a_{ij})_{i}\Vert _q +\max _{i\le m, j\le n}\sqrt{\ln (j+1)} a_{ij}', \end{aligned}$$

so

$$\begin{aligned}{} & {} \max _{j\le n} \Vert (a_{ij})_{i}\Vert _q + \max _{i\le m} \Vert (a_{ij})_{j}\Vert _{p^*} + \max _{j\le n, i \le m} \sqrt{\ln (i+1)} a_{ij}'' \nonumber \\{} & {} \quad \asymp _{q} \max _{j\le n} \Vert (a_{ij})_{i}\Vert _q + \max _{i\le m} \Vert (a_{ij})_{j}\Vert _{p^*} +\max _{i\le m, j\le n}\sqrt{\ln (j+1)} a_{ij}', \end{aligned}$$
(5.7)

where the matrix \((a_{ij}'')_{i,j}\) is obtained by permuting the rows of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _j a_{1j}''\ge \dots \ge \max _j a_{mj}''\).

5.3 Counterexample to a seemingly natural conjecture

In this subsection we provide an example showing that for any \(p\le q <2\) the bound

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert&\lesssim _{p,q} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2} \Vert ^{1/2} + \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \nonumber \\&\quad + \mathbb {E}\max _{i\le m, j\le n}|a_{ij}g_{ij}|. \end{aligned}$$
(5.8)

cannot hold. By duality (1.12), it also cannot hold for any \(2<p\le q\). This explains that Conjecture 1 cannot be simplified into a form like on the right-hand side of (1.8).

Let \(p\le q <2\), \(k, N\in \mathbb {N}\), and let \(A_1, \ldots , A_N\) be \(k\times k\) matrices with all entries equal to one. Consider a block matrix

$$\begin{aligned} A = \begin{pmatrix} \begin{matrix} A_1 &{} \\ &{} A_2 \end{matrix} &{} 0 \\ 0 &{} \begin{matrix} \ddots &{} \\ &{} A_N \end{matrix} \end{pmatrix} \end{aligned}$$

of size \(kN \times kN\), with blocks \(A_1, \ldots A_N\) on the diagonal and with all other entries equal to 0.

Note that since \(p\le q \le 2\),

$$\begin{aligned} \Vert A\mathbin {\circ }A :\ell ^{kN}_{p/2} \rightarrow \ell ^{kN}_{q/2} \Vert&= \max _{l\le N} \Vert A_l\mathbin {\circ }A_l :\ell ^{k}_{p/2} \rightarrow \ell ^{k}_{q/2} \Vert = \Vert A_1\mathbin {\circ }A_1 :\ell ^{k}_{p/2} \rightarrow \ell ^{k}_{q/2} \Vert \\&=\sup _{x\in B_{p/2}^k} \Bigl ( \sum _{i=1}^k \Bigl | \sum _{j=1}^k x_i \Bigr |^{q/2} \Bigr )^{2/q} =\sup _{x\in B_{p/2}^k} k^{2/q} \Bigl | \sum _{i=1}^k x_i \Bigr | = k^{2/q}, \end{aligned}$$

and similarly, since \(2\le q^*\le p^{*}\),

$$\begin{aligned} \Vert (A\mathbin {\circ }A)^T :\ell ^{kN}_{q^*/2} \rightarrow \ell ^{kN}_{p^*/2} \Vert = \Vert (A_1\mathbin {\circ }A_1)^T :\ell ^{k}_{q^*/2} \rightarrow \ell ^{k}_{p^*/2} \Vert = k^{2/p^*+1-2/q^*}. \end{aligned}$$

The two bounds above and Lemma 2.10 imply that the right-hand side of (5.8) is bounded from above by

$$\begin{aligned} C \Bigl ( k^{1/q}+k^{1/p^*+1/2-1/q^*} + \sqrt{\ln (kN)} \Bigr ). \end{aligned}$$
(5.9)

On the other hand, since for all \(j\le kN\), \(\Vert (a_{ij})_{i} \Vert _{2q/(2-q)} = k^{(2-q)/(2q)}\), we obtain from the lower bound (5.2) that

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _p^{kN}\rightarrow \ell _q^{kN}\Vert > rsim \sqrt{\ln (kN)} k^{(2-q)/(2q)}. \end{aligned}$$
(5.10)

If we take \(N\asymp e^{e^k}\), then (5.10) is of larger order than (5.9) as \(k\rightarrow \infty \), so (5.8) cannot hold.

5.4 Discussion of another natural conjecture

In this subsection we prove all the assertions of Remark 1.1. We begin by showing that for every \(1\le p \le 2\le q \le \infty \),

$$\begin{aligned} D_1+D_2+ \mathbb {E}\max _{i,j}|a_{ij}g_{ij}| \asymp _{p,q} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} +\mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _q,\qquad \end{aligned}$$
(5.11)

and, in the case \(p,q\ge 2\),

$$\begin{aligned}{} & {} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} +\mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _q \nonumber \\{} & {} \quad \lesssim _{p,q} \max _{i\le m} \Vert (a_{ij})_j\Vert _{p^*} +\max _{j\le n} \Vert (a_{ij})_i\Vert _q +\max _{i\le m} \sqrt{\ln (i+1)}d_i^{\downarrow {}}, \end{aligned}$$
(5.12)

where \( D_1 = \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\), \(D_2 = \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}\), and \(d_i = \Vert (a_{ij})_{j\le n} \Vert _{2p/(p-2)}\). In other words, (5.11) shows that Conjecture 1 is equivalent to (1.15) as long as \(1\le p \le 2\le q \le \infty \).

Proof of (5.11) and (5.12)

Fix \(i\le m\) and let \(f(x)=\Vert (a_{ij}x_j)_j\Vert _{p^*}\) for \(x\in \mathbb {R}^n\). For \(p\ge 2\) we have \(p^*(2/p^*)^*= 2p/(p-2)\). Thus f is Lipschitz continuous with constant \(L_i\) equal to

$$\begin{aligned} \sup _{x\in B_2^n} \Bigl (\sum _{j=1}^n |a_{ij}x_{j}|^{p^*}\Bigr )^{1/p^*} = \sup _{y\in B_{2/p^*}^n} \Bigl (\sum _{j=1}^n |a_{ij}|^{p^*} y_{j} \Bigr )^{1/p^*} ={\left\{ \begin{array}{ll} \max _{j\le n} |a_{ij}| &{} \text {if }\ p\le 2,\\ \Vert (a_{ij})_j\Vert _{{2p/(p-2)}} &{} \text {if } \ p\ge 2. \end{array}\right. } \end{aligned}$$

Therefore, the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]) implies that for every \(t\ge 0\) and every \(i\le m\),

$$\begin{aligned} \mathbb {P}\Bigl (\Vert (a_{ij}g_{ij})_j\Vert _{p^*} - \mathbb {E}\Vert (a_{ij}g_{ij})_j\Vert _{p^*} \ge t \Bigr ) \le e^{-t^2/2L_i^2}, \end{aligned}$$

so by Lemma 2.11 we get

$$\begin{aligned}{} & {} \mathbb {E}\max _{i\le m}\Bigl (\Vert (a_{ij}g_{ij})_j\Vert _{p^*} - \mathbb {E}\Vert (a_{ij}g_{ij})_j\Vert _{p^*} \Bigr ) \nonumber \\{} & {} \quad \lesssim {\left\{ \begin{array}{ll} \max _{i\le m} \max _{j\le n} \sqrt{\ln (i+1)} a_{ij}'' &{} \text {if }\ p\le 2,\\ \max _{i\le m} \sqrt{\ln (i+1)}d_i^{\downarrow {}} &{} \text {if } \ p\ge 2, \end{array}\right. } \end{aligned}$$
(5.13)

where the matrix \((a_{ij}'')_{i,j}\) is obtained by permuting the rows of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _j a_{1j}'' \ge \dots \ge \max _j a_{mj}''\).

Moreover, by Jensen’s inequality,

$$\begin{aligned} \mathbb {E}\Vert (a_{ij}g_{ij})_j\Vert _{p^*} \le \bigl (\mathbb {E}\Vert (a_{ij}g_{ij})_j\Vert _{p^*}^{p^*} \bigr )^{1/p^*} = \Bigl (\mathbb {E}\sum _{j=1}^n |a_{ij}g_{ij}|^{p^*}\Bigr )^{1/p^*} =\gamma _{p^*} \Vert (a_{ij})_j\Vert _{p^*}. \end{aligned}$$

This together with the triangle inequality and (5.13) implies

$$\begin{aligned}{} & {} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*}\\ {}{} & {} \quad \lesssim _{p} \max _{i \le m} \Vert (a_{ij})_j\Vert _{p^*} + {\left\{ \begin{array}{ll} \max _{i\le m} \max _{j\le n} \sqrt{\ln (i+1)} a_{ij}'' &{} \text {if }\ p\le 2,\\ \max _{i\le m} \sqrt{\ln (i+1)}d_i^{\downarrow {}} &{} \text {if } \ p\ge 2, \end{array}\right. } \end{aligned}$$

and, by duality,

$$\begin{aligned}{} & {} \mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _{q} \\ {}{} & {} \quad \lesssim _{q} \max _{j \le n}\Vert (a_{ij})_i\Vert _{q} + {\left\{ \begin{array}{ll} \max _{j\le n}\max _{i\le m} \sqrt{\ln (j+1)} a_{ij}' &{} \text {if }\ q\ge 2,\\ \max _{j\le n} \sqrt{\ln (j+1)}b_j^{\downarrow {}} &{} \text {if } \ q\le 2, \end{array}\right. } \end{aligned}$$

where \(b_j=\Vert (a_{ij})_i)\Vert _{2q/(2-q)}\), and the matrix \((a_{ij}')_{i,j}\) is obtained by permuting the columns of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _i a_{i1}'\ge \dots \ge \max _i a_{in}'\). This, together with Lemma 2.1 and (5.4) yields in the case \(p\le 2\le q\),

$$\begin{aligned} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} + \mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _{q} \lesssim _{p,q} D_1+D_2 + \mathbb {E}\max _{i,j} |a_{ij}g_{ij}|, \end{aligned}$$

what implies the lower bound of (5.11). In the case \(2<p, q\) we additionally use (5.7) and the simple observation that

$$\begin{aligned} \max _{i\le m} \max _{j\le n} \sqrt{\ln (i+1)} a_{ij}'' \le \max _{i\le m} \sqrt{\ln (i+1)}d_i^{\downarrow {}} \end{aligned}$$

to get (5.12).

Now we move to the proof of the upper bound of (5.11) in the case \(p\le 2\le q\). Since the \(\ell _{p^*}^n\) norm is unconditional, we have by Jensen’s inequality and Lemma 2.1

$$\begin{aligned} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} =\mathbb {E}\max _{i\le m} \Vert (|a_{ij}g_{ij}|)_j\Vert _{p^*}&\ge \max _{i\le m} \Vert (|a_{ij}|\mathbb {E}|g_{ij}|)_j\Vert _{p^*} \\ {}&=\sqrt{2/\pi } \max _{i\le m} \Vert (|a_{ij}|)_j\Vert _{p^*} =\sqrt{2/\pi } D_2 , \end{aligned}$$

and dually

$$\begin{aligned} \mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _{q} \ge \sqrt{2/\pi } D_1. \end{aligned}$$

Moreover, since \(\Vert \cdot \Vert _q\ge \Vert \cdot \Vert _{\infty }\),

$$\begin{aligned} \mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _{q} \ge \mathbb {E}\max _{j} \max _{i} |a_{ij}g_{ij}|, \end{aligned}$$

which finishes the proof of the upper bound of (5.11). \(\square \)

Next, for every pair \((p,q)\in [1,\infty ]^2\) which does not satisfy the condition \(1\le p\le 2\le q\le \infty \) we shall give examples of \(m,n\in \mathbb {N}\), and \(m\times n\) matrices A, for which

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \gg \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} +\mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _q \end{aligned}$$
(5.14)

when \(m,n \rightarrow \infty \). This shows that the natural conjecture (1.15) is wrong outside the range \(1\le p\le 2\le q\le \infty \). The case \(p=2=q\), when (1.15) is valid (cf. (1.4)), is in a sense a boundary case, for which (1.15) (i.e., a natural generalization of (1.4)) may hold.

Example 5.6

(for (5.14) in the case \(q<p\).) Let \(m=n\), and \(A={\text {Id}}_n\). Then by Lemmas 2.10 and 2.12 we have

$$\begin{aligned} \mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} +\mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _q = 2 \max _{i\le n} |g_{ii}| \asymp \sqrt{\ln n}, \end{aligned}$$

whereas Proposition 5.1 and our assumption \(p/q>1\) imply

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^n\Vert& > rsim \Vert {\text {Id}}_n :\ell _{p/2}^n \rightarrow \ell _{q /2}^n \Vert ^{1/2} = \sup _{x\in B_{p/2}^n} \Bigl ( \sum _{i=1}^n |x_i|^{q/2} \Bigr )^{1/q} \\&= \Bigl ( \sup _{y\in B_{p/q}^n} \sum _{i=1}^n |y_i| \Bigr )^{1/q} = \bigl ( n^{1/(p/q)^*} \bigr )^{1/q} \gg \sqrt{\ln n}. \end{aligned}$$

Since cases \(2<p \le q\) and \(p\le q <2\) are dual (see (1.12)), we give an example for which (5.14) holds only in the first case.

Example 5.7

(for (5.14) in the case \(2<p \le q\).) Fix p and q satisfying \(2<p \le q\). Let \(m,n \rightarrow \infty \) be such that \(m^{1/q}\gg n^{1/p^*}\), and let A be an \(m\times n\) matrix with all entries equal to 1. For \(p>2\) we have \(2(p/2)^*= 2p/(p-2)\). This together with (5.12) implies

$$\begin{aligned}&\mathbb {E}\max _{i\le m} \Vert (a_{ij}g_{ij})_j\Vert _{p^*} +\mathbb {E}\max _{j\le n} \Vert (a_{ij}g_{ij})_i\Vert _q \\&\quad \lesssim _{p,q} \max _{i\le m} \Vert (a_{ij})_j\Vert _{p^*} +\max _{j\le n} \Vert (a_{ij})_i\Vert _q +\max _{i\le m} \sqrt{\ln (i+1)}d_i^{\downarrow {}} \\&\quad = n^{1/p^*} + m^{1/q} + \sqrt{\ln (m+1)}n^{(p-2)/2p} \lesssim m^{1/q} + \sqrt{\ln m}\, n^{\frac{1}{2(p/2)^*}}. \end{aligned}$$

On the other hand, Proposition 5.1 and our assumption \(p/2>1\) imply

$$\begin{aligned} \mathbb {E}\Vert G_A:\ell _{p}^n\rightarrow \ell _q^n\Vert& > rsim \Vert A :\ell _{p/2}^n \rightarrow \ell _{q /2}^n \Vert ^{1/2} = \sup _{x\in B_{p/2}^n} \Bigl ( \sum _{i=1}^m \Bigl |\sum _{j=1}^n x_j\Bigr |^{q/2} \Bigr )^{1/q} \\&= m^{1/q} \sup _{x\in B_{p/2}^n} \Bigl ( \Bigl |\sum _{j=1}^n x_j\Bigr | \Bigr )^{1/2} = m^{1/q} n^{\frac{1}{2(p/2)^*}} \\ {}&\gg m^{1/q} + \sqrt{\ln m} \, n^{\frac{1}{2(p/2)^*}}. \end{aligned}$$

5.5 Infinite dimensional Gaussian operators

In this subsection we prove Proposition 1.2 concerning infinite dimensional Gaussian operators. It allows us to see that Conjecture 1 implies Conjecture 2.

Proof of Proposition 1.2

We adapt the proof of [40, Corollary 1.2] to prove Proposition 1.2 in the case \(p\le 2\le q\)—remaining cases may be proven similarly. Fix \(1\le p \le 2\le q\le \infty \) for which (1.14) holds and a deterministic infinite matrix \(A=(a_{ij})_{i,j\in \mathbb {N}}\). Using the monotone convergence theorem one can show that a matrix \(B = (b_{ij})_{i,j\in \mathbb {N}}\) defines a bounded operator between \(\ell _p(\mathbb {N})\) and \(\ell _q(\mathbb {N})\) if an only if \(\sup _{n\in \mathbb {N}} \Vert (b_{ij})_{i,j\le n}:\ell _p^n\rightarrow \ell _q^n\Vert < \infty \). Interpreting \(\Vert B:\ell _p(\mathbb {N})\rightarrow \ell _q(\mathbb {N})\Vert \) as infinity for matrices which do not define a bounded operator, we have

$$\begin{aligned}&\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert = \mathbb {E}\sup _{x\in B_p^\infty } \biggl ( \sum _{i=1}^\infty \Bigl | \sum _{j=1}^\infty a_{ij}g_{ij}x_j \Bigr |^q \biggr )^{1/q} \\&\quad = \mathbb {E}\lim _{n\rightarrow \infty } \sup _{x\in B_p^n} \biggl ( \sum _{i=1}^n \Bigl | \sum _{j=1}^n a_{ij}g_{ij}x_j \Bigr |^q \biggr )^{1/q} = \lim _{n\rightarrow \infty } \mathbb {E}\sup _{x\in B_p^n} \biggl ( \sum _{i=1}^n \Bigl | \sum _{j=1}^n a_{ij}g_{ij}x_j \Bigr |^q \biggr )^{1/q} \\&\quad = \lim _{n\rightarrow \infty } \mathbb {E}\bigl \Vert (g_{ij}a_{ij})_{i,j\le n} :\ell _p^n \rightarrow \ell _q^n \bigr \Vert \end{aligned}$$

and similarly

$$\begin{aligned} \Vert A\mathbin {\circ }A :\ell _{p/2}(\mathbb {N}) \rightarrow \ell _{q/2}(\mathbb {N}) \Vert&= \lim _{n\rightarrow \infty } \Vert (a_{ij}^2)_{i,j\le n} :\ell ^n_{p/2} \rightarrow \ell ^n_{q/2} \Vert ,\\ \Vert (A\mathbin {\circ }A)^T :\ell _{q^*/2}(\mathbb {N}) \rightarrow \ell _{p^*/2}(\mathbb {N}) \Vert&= \lim _{n\rightarrow \infty } \Vert (a_{ji}^2)_{i,j\le n} :\ell ^n_{q^*/2} \rightarrow \ell ^n_{p^*/2} \Vert , \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}\sup _{i, j \in \mathbb {N}}|a_{ij}g_{ij}|&= \lim _{n\rightarrow \infty } \mathbb {E}\sup _{i, j \le n}|a_{ij}g_{ij}|. \end{aligned}$$

Therefore, (1.14) implies the following: \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \) if and only if \(\Vert A\mathbin {\circ }A :\ell _{p/2}(\mathbb {N}) \rightarrow \ell _{q/2}(\mathbb {N}) \Vert <\infty \), \(\Vert (A\mathbin {\circ }A)^T :\ell _{q^*/2}(\mathbb {N}) \rightarrow \ell _{p^*/2}(\mathbb {N}) \Vert <\infty \), and \(\mathbb {E}\sup _{i, j \in \mathbb {N}}|a_{ij}g_{ij}| <\infty \). It thus suffices to prove the following claim: \(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \) almost surely if and only if \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \).

If \(\mathbb {P}(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert<\infty ) <1\), then \(\mathbb {P}(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert =\infty ) >0\), so \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert =\infty \).

Assume now that \(\mathbb {P}(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty ) =1\). By (4.23) and (4.24) we know that for every \(n\in \mathbb {N}\) there exist finite sets \(S_n\) and \(T_n\) such that

$$\begin{aligned} \Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert&= \sup _{n\in \mathbb {N}} \sup _{x\in B_p^n, y\in B_{q^*}^n} \sum _{i=1}^n \sum _{j=1}^n y_i a_{ij} g_{ij} x_j \\ {}&\asymp \sup _n \sup _{x\in S_n, y\in T_n} \sum _{i=1}^n \sum _{j=1}^n y_i a_{ij} g_{ij} x_j \qquad \text {a.s.} \end{aligned}$$

In particular, there exist Gaussian random variables \((\Gamma _k)_{k\in \mathbb {N}}\) such that

$$\begin{aligned} \Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert \asymp \sup _{k\in \mathbb {N}} \Gamma _k \qquad \text {a.s.} \end{aligned}$$

Therefore, we may apply [35, (1.2)] to see that there exists \(\varepsilon >0\) such that \(\mathbb {E}\exp (\varepsilon \Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert ^2 ) < \infty \), so \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \), which completes the proof of the claim. \(\square \)