Uniform recovery in infinite-dimensional compressed sensing and applications to structured binary sampling

Infinite-dimensional compressed sensing deals with the recovery of analog signals (functions) from linear measurements, often in the form of integral transforms such as the Fourier transform. This framework is well-suited to many real-world inverse problems, which are typically modelled in infinite-dimensional spaces, and where the application of finite-dimensional approaches can lead to noticeable artefacts. Another typical feature of such problems is that the signals are not only sparse in some dictionary, but possess a so-called local sparsity in levels structure. Consequently, the sampling scheme should be designed so as to exploit this additional structure. In this paper, we introduce a series of uniform recovery guarantees for infinite-dimensional compressed sensing based on sparsity in levels and so-called multilevel random subsampling. By using a weighted $\ell^1$-regularizer we derive measurement conditions that are sharp up to log factors, in the sense they agree with those of certain oracle estimators. These guarantees also apply in finite dimensions, and improve existing results for unweighted $\ell^1$-regularization. To illustrate our results, we consider the problem of binary sampling with the Walsh transform using orthogonal wavelets. Binary sampling is an important mechanism for certain imaging modalities. Through carefully estimating the local coherence between the Walsh and wavelet bases, we derive the first known recovery guarantees for this problem.


Introduction
Compressive sensing (CS), introduced by Candes, Romberg & Tao in [10] and Donoho in [14], has been an area of substantial research during the last decade. The key assumption, which lays the foundation for this field of research, is that a sparse vector x P C M can be recovered from an underdetermined system of linear equations, using, for instance, convex optimization algorithms [15,16].
Imaging has been one of the most successful areas of application of CS. However, in this area, the sparsity assumption is typically too general. Examples include all applications using Fourier samples -such as Magnetic Resonance Imaging (MRI) [22,24,25], surface scattering [21], Computerized Tomography (CT) and electron microscopy -as well as applications using binary sampling, e.g. fluorescence microscopy [29], lensless imaging [33] and numerous other optical imaging modalities [6,17,32]. Natural images, when sparsified via a wavelet (or more generally, X-let) transform, are not only sparse, but have specific sparsity structure [3,27]. For wavelets, which will be our sparsifying transform in this paper, natural images have coefficients where most of the large entries are concentrated at the coarse scales, and progressively fewer at the fine scales (termed asymptotic sparsity in [3]).
In the presence of structured sparsity, it is natural to ask how best to promote this additional structure. In [3] it was proposed to do this via the sampling operator. Wavelets partition Fourier space into dyadic bands corresponding to distinct scales. Hence, by choosing Fourier samples in these bands corresponding to the local sparsities, one obtains as structured sampling scheme -a so-called multilevel sampling scheme -which promotes the asymptotic sparsity structure. The practical benefits of such schemes have been demonstrated in [27] for various different imaging modalities, including MRI, Nuclear Magnetic Resonance (NMR) spectroscopy, fluorescence microscopy and Helium Atom Scattering. Theoretical analysis has been presented in [3] (nonuniform recovery) and [7,23] (uniform recovery in the finite-dimensional setting).

Main results
This paper has two main objectives. First, we generalize existing uniform recovery guarantees [7,23] from the finite-dimensional to the infinite-dimensional setting. This extension is important for practical imaging. Although much of the compressive imaging literature considers the recovery of discrete images (i.e. finite-dimensional arrays) from discrete measurements (e.g. the discrete Fourier transform), modalities such as MRI, NMR and others are naturally analog, and hence better modelled over the continuum (i.e. functions, and the continuous Fourier transform). Indeed, as we will see in Section 2.3, discretizing such a problem leads to measurement mismatch [11], and in the case of wavelet recovery, the wavelet crime [28,232], both of which can introduce artefacts in the reconstruction [19]. In this paper, we consider signals as functions f P L 2 pr0, 1qq and work with continuous integral transforms, thus avoiding these pitfalls.
In our theoretical analysis, we also improve the uniform recovery guarantee given in previous works [7,23]. Unlike previous results, our recovery guarantees are, up to log factors, optimal: specifically, they agree with those of the oracle least-square estimator based on a priori knowledge of the support [1]. We do this by replacing the standard 1 -minimization decoder by a certain weighted 1 -minimization decoder; an idea originally proposed in [31].
Our second objective is to consider binary sampling. Previous works have addressed the case of (discrete or continuous) Fourier sampling. Yet many imaging modalities, e.g. fluorescence microscopy and lensless imaging, require binary sampling operators. To do so, we replace the Fourier transform f pxqw n pxq dx, f P L 2 pr0, 1qq where w n : r0, 1q Ñ t`1,´1u, n P Z`:" t0, 1, . . .u denote the Walsh functions. This is a widely used sampling operator in binary imaging [29,33], and often goes under the name of Hadamard sampling in the discrete case. Working with this continuous transform, we provide analogous guarantees for binary sampling to those for Fourier sampling. As a side note, we remark that working in the continuous setting also simplifies the analysis (specifically, the derivation of socalled local coherence estimates) over working directly with the discrete setup. We note that in this paper we only consider recovery guarantees for one dimensional functions. We expect that the setup for higher dimensional function will deviate slightly from what we present here, and we will save this discussion for future work.
The outline of the remainder of this paper is as follows. We commence in Section 2 by reviewing previous work, and in particular, the existing finite-dimensional theory. We then introduce an abstract infinite-dimensional model for isometries U acting on 2 pNq in Section 3. Here we will derive sufficient conditions for such operators to provide uniform recovery guarantees. In Section 4 we continue this work by finding conditions for which the cross-Gramian U between a wavelet and Walsh basis satisfies these conditions. Finally in Section 5, 6 and 6.6 we will present proofs of our main results.
2 Sparsity in levels in finite dimensions 2

.1 Notation
For N P N and Ω Ď t1, . . . , N u we let P Ω P C NˆN denote the projection onto the linear span of the associated subset of the canonical basis, i.e. for x P C N , we have pP Ω xq i " x i if i P Ω and pP Ω xq i " 0 if i R Ω. Sometimes, we will abuse this notation slightly by assuming P Ω P C |Ω|ˆN , and discard all the zero entries in P Ω x. Whether we mean P Ω P C NˆN or P Ω P C |Ω|ˆN will be clear from the context. If Ω " tN k´1`1 , . . . , N k u we simply write P N k´1 N k " P tN k´1`1 ,...,N k u , and simply P N k if N k´1 " 0.
We call a vector x P C N s-sparse if |supppxq| ď s, where supppxq " ti : x i ‰ 0u. We write A À B if there exits a constant C ą 0 independent of all relevant parameters, so that A ď CB, and similarly for A Á B.

Finite model
Let V P C NˆN be a measurement matrix e.g. a Fourier of Hadamard matrix, denoted V Four and V Had , respectively, and let Ω Ă t1, . . . N u with |Ω| " m ă N . In a typical finite-dimensional CS setup we consider the recovery of a signal x P C N from measurements y " P Ω V x`e P C m , where e P C m is a vector of measurement error. If x is sparse in a discrete wavelet basis, one then recovers its coefficients by solving the optimization problem where Ψ P C NˆN is a discrete wavelet transform and η ě }e} 2 is a noise parameter. Usually one would scale V P C NˆN so that it becomes orthonormal and choose an orthonormal wavelet basis, so that the matrix U " V Ψ´1 " V Ψ T acts as an isometry on C N . Suppose that U is indeed an isometry. To obtain a uniform recovery guarantee for the above system, one typically first shows that the matrix A " 1 ? p P Ω U P C mˆN , with p " m N , satisfies the Restricted Isometry Property (RIP) with high probability. Definition 2.1 (RIP). Let 1 ď s ď N and A P C mˆN . The Restricted Isometry Constant (RIC) of order s is the smallest δ ě 0 such that where Σ s denotes the set of s-sparse vectors in C N . If 0 ď δ ă 1 we say that A has the Restricted Isometry Property (RIP) of order s.
For an isometry U P C NˆN the question of whether or not P Ω U satisfies the RIP is related to the so-called coherence of U : Definition 2.3 (Coherence). Let U P C NˆN be an isometry. The coherence of U is µpU q " max i,j"1,...,N |U ij | 2 P rN´1, 1s. Theorem 2.4 ([16,Thm. 12.32]). Let U P C NˆN be an isometry and let 0 ă δ, ă 1. Suppose Ω " tt 1 , . . . t m u Ď t1, . . . , N u where each t k is chosen uniformly and independently at random from the set t1, . . . , N u. If m Á δ´2¨s¨N¨µpU q¨`logp2mq logp2N q log 2 p2sq`logp ´1 qt hen with probability 1´ the matrix A " 1 ? p P Ω U P C mˆN , with p " m N , satisfies the RIP of order s with δ s ď δ.
(We slightly abuse notation here in that we allow for possible repeats of the values t i that make up Ω). Thus if the coherence µpU q « N´1 we obtain the RIP of order s using approximately s measurements up to constants and log factors.
There are, however, two problems with this approach. First, in our setup, where U " V Ψ T is the product of a Fourier or Hadamard matrix and a discrete wavelet transform, the coherence µpU q « 1. Hence satisfying the RIP requires at least m « N measurements. Second, the RIP asserts recovery for all s-sparse vectors of wavelet coefficients, and thus does not exploit any additional structure these coefficients possess. However, as stated, wavelet coefficient are highly structured: large wavelet coefficients tend to cluster at coarse scales, with coefficients at fine scales being increasingly sparse.
In which case we call x, ps, Mq-sparse, where s and M are called the local sparsities and sparsity levels, respectively. We denote the set of all ps, Mq-sparse vectors by Σ s,M .
As noted above, randomly subsampling an isometry U is a poor measurement protocol for coherent problems such as Fourier-Wavelets. Instead, in [3] it was proposed to sample in the following structured way: Definition 2.6 (Multilevel random subsampling). Let N " rN 1 , . . . , N r s P N r , where 1 ď N 1 ă¨¨ă N r " N and m " pm 1 , . . . , m r q P N r with m k ď N k´Nk´1 for k " 1, . . . , r, and N 0 " 0. For each k " 1, . . . , r, let Ω k " tN k´1`1 , . . . , N k u if m k " N k´Nk´1 and if not, let t k,1 , . . . , t k,m k be chosen uniformly and independently from the set tN k´1`1 , . . . , N k u, and set Ω k " tt k,1 , . . . , t k,m k u. If Ω " Ω N,m " Ω 1 Y¨¨¨Y Ω r we refer to Ω as an pN, mq-multilevel subsampling scheme. For this structured model, the following extensions of the RIP was first introduced in [7]. Definition 2.7 (RIPL). Let s, M P N r be given local sparsities and sparsity levels, respectively. For a matrix A P C mˆN the Restricted Isometry Constant in Levels (RICL) of order ps, Mq, denoted δ s,M , is the smallest δ ě 0 such that p1´δq}x} 2 2 ď }Ax} 2 2 ď p1`δq}x} 2 2 @x P Σ s,M . We say that A has the Restricted Isometry Property in Levels (RIPL) if 0 ď δ ă 1.
We shall see that this leads to uniform recovery of all ps, Mq-sparse vectors, but first we define the best ps, Mq-term approximation error of x P C N . That is σ s,M pxq p :" inft}x´z} p : z P Σ s,M u. Theorem 2.8 ([7,Thm. 4.4]). Let s, M P N r be local sparsities and sparsity levels, respectively. Let α s,M " max k,l"1,...,r s l {s k and s " s 1`¨¨¨`sr . Suppose that the RICL δ 2s,M ě 0 for the matrix A P C mˆM satisfies Then, for x P C M and e P C m with }e} 2 ď η, any solutionx of where C, C 1 , D, D 1 ą 0 are constants which only dependent on δ 2s,M .
In [23] the authors investigated conditions under which a subsampled isometry U P C NˆN satisfies the RIPL. In was shown that the number of samples required to satisfy the RIPL was related to the so-called local coherence properties of U : Definition 2.9. Let U P C NˆN be an isometry and N, M P N r be given sampling and sparsity levels. The local coherence of U is Theorem 2.10 ([23, thm. 3.2]). Let U P C NˆN be an isometry. Let r P N, 0 ă δ, ă 1, and 0 ď r 0 ď r. Let Ω " Ω N,m be an pN, mq-multilevel random subsampling scheme. Let m " m r0`1`. . .`m r and s " s 1`. . .`s r . Suppose that the m k s satisfy for k " r 0`1 , . . . , r. Then the matrix This theorem characterizes the number of local measurements m k needed to ensure uniform recovery explicitly in terms of local sparsities s k and local coherences µ k,l . In particular, if the local coherences are suitably well-behaved, then recovery may still be possible from highly subsampled measurements, even though the global coherence may be high (see next). Note that the condition (2.3), whereby the first r 0 sampling levels are saturated, models practical imaging scenarios where the low Fourier frequencies are typically fully sampled.
To illustrate this theorem, in [4] the authors consider the one-dimensional discrete Fourier sampling problem with sparsity in Haar wavelets. For the Haar wavelet basis we choose an ordering where the first level tM 0`1 , M 1 u " t1, 2u consists of the scaling function and mother wavelet and the subsequent levels are chosen so that tM l´1`1 , . . . , M l u " t2 l´1`1 , . . . , 2 l u consists of the wavelets at scale l´1. This gives the sparsity levels M " r2 1 , 2 2 , . . . , 2 r s, where r " log 2 pN q (assumed to be an integer). Next we define the entries in the Fourier matrix V Four P C NˆN as where we have started the ordering of the rows with negative indices for convenience. We define the sampling levels for the frequencies ω in dyadic bands with W 1 " t0, 1u and W k`1 " t´2 k`1 , . . . ,´2 k´1 u Y t2 k´1`1 , . . . , 2 k u, k " 1, . . . , r´1.
Notice that for a suitable reordering of the rows of V Four these bands corresponds to the sampling levels N " r2 1 , 2 2 , . . . , 2 r s. . Let N " 2 r for some r ě 1 and let U " V Four Ψ´1 P C NˆN , where Ψ is the Haar wavelet matrix. Let 0 ă δ, ă 1 and let N " M " r2 1 , . . . , 2 r s. Let m " m 1`¨¨¨mr and s " s 1`¨¨¨sr . For each k " 1, . . . , r suppose we draw m k Fourier samples from band W k randomly and independently, where Then with probability at least 1´ the matrix (2.5) satisfies the RIPL with constant δ s,M ď δ.
Here, for convenience, we have taken r 0 " 0; see [23] for further discussion on this point.

Shortcomings
These results have two primary shortcomings, which we now discuss in further detail. The key issue is that they are limited to finite dimensions. As noted in Section 1, applying finitedimensional recovery procedures to analog problems can result in artefacts. For simplicity, let N " 2 p . We have argued that analog signals should be modelled as elements in L 2 pr0, 1qq, rather than C N . Yet, above we have tried to use discrete tools for recovering the signal f P L 2 pr0, 1qq by replacing Wf and Ff with V Had and V Four , respectively. Next we argue that this construction leads to both measurement mismatch and the wavelet crime.
Let χ ra,bq denote step functions on the interval ra, bq and set ∆ k,p " rk2´p, pk`1q2´pq. We see that replacing Wf with V Had P C NˆN is equivalent to replacing f by e.g.f " ř N´1 k"0 c k χ ∆ k,r for some c P C N , since Wf " V Had c. Clearly, Wf will be a poor approximation to Wf . We refer to this as measurement mismatch.
Next let φ 0 , φ 1 denote a scaling function and wavelet, respectively, and set φ s j,k " 2 j{2 φ s p2 jk q for s P t0, 1u. By construction the solutionx of (2.1) will be the coefficients of a function f written in a basis consisting of both wavelets and scaling functions. Equivalently we can representf in the basis tφ 0 j,k u N´1 k"0 using the coefficients c " Ψ´1x P C N . The wavelet crime is whenever we let c, represent pointwise samples of f i.e. c k " f pk{N q.
What does this mean for reconstruction? To illustrate the issue we provide a similar example to the first numerical simulation in [2], showing how finite-dimensional compressed sensing fails to recover even a function that is 1-sparse (meaning it has only one non-zero coefficient) in its wavelet decomposition. Indeed, in Figure 1 we consider the problem of recovering a function f from samples of the continuous Walsh transform. In particular, we choose f ptq " φ 4,4 ptq, where φ is the Daubechies scaling function, corresponding to the wavelet with four vanishing moments. Figure 1 shows the poor performance of CS using the discrete finite-dimensional setup when applied to a continuous problem. Conversely, the infinite-dimensional CS approach, which we develop in the next sections, gives a much higher fidelity reconstruction from exactly the same samples as used in the finite-dimensional case. In fact, the infinite-dimensional CS reconstruction recovers f perfectly up to numerical errors occurring from solving the optimization problem. We also observe the slightly paradoxical phenomenon in the finite-dimensional case: more samples do not improve performance. This is due to the fact that the finite-dimensional CS solution with full sampling coincides with the truncated Walsh series (direct inversion) approximation. This approximation is clearly highly suboptimal, as demonstrated in Figure 1.
We note in passing that the above crimes stem from too early a discretization of the inverse problem. Our infinite-dimensional CS approach replaces V Had Ψ´1 by a finite section of the an isometry U P Bp 2 pNqq representing change of basis between the continuous Fourier or Walsh transform and wavelet basis.
On a related note, even if one were to ignore the above issues, estimating the local coherences µ k,l in the discrete setting for anything but the Haar wavelet becomes extremely complicated. Conversely, by moving to the continuous setting, these estimates become much easier to derive. We do this later in the paper for arbitrary Daubechies  all sparse signals provided the matrix A P C mˆM satisfies the RIPL with constant Here r is the number of levels and α s,M " max k,l"1,...,r s l {s k is the sparsity ratio. Inserting the above inequality into Theorem 2.10 gives a sampling condition of the form where L is the log factors. This means that the sparsity ratio α s,M will affect the sampling condition in all sampling levels. Thus for signals where we expect the local sparsities to vary greatly from level to level (e.g. wavelets) this will lead to a unreasonably high number of samples.
To overcome this problem, using an idea from [31], we replace the 1 -regularizer in the optimization problem (2.1) with a weighted 1 -regularizer. For a suitable choice of weights, this removes the factor of α s,M in the various measurement conditions. As we show, these guarantees are optimal up to constants and log factors.

Extensions to infinite dimensions 3.1 Setup
We will continue with the notation we introduced above, extended to infinite dimensions. That is, we assume that the signal f is an element of L 2 pr0, 1qq. We still let P Ω denote the projection onto the canonical basis, but we now let it be an element in either Bp 2 pNqq or Bp 2 pNq, C |Ω| q. Similarly we call a vector x P 2 pNq ps, Mq-sparse if P M x is ps, Mq-sparse and P K M x " 0. Here M " M r and we refer to it as the sparsity bandwidth of x. For an isometry U P Bp 2 pNqq we define the coherence of U as µpU q " supt|U ij | 2 : i, j P Nu.
Next we describe the setup for a general sampling basis B sa " tb sa 1 , b sa 2 , b sa 3 , . . . , u and a sparsifying basis B sp " tb sp 1 , b sp 2 , b sp 3 , . . . , u, both assumed to be orthonormal bases of L 2 pr0, 1qq. In Section 4, we will specialize this so that B sa is the Walsh sampling basis and B sp is a wavelet sparsifying basis. This will enable us to derive concrete recovery guarantees for f . The setup below is, however, completely general.
For the two bases B sa and B sp we can represent f using the coefficients y " txf, b sa n yu nPN and x " txf, b sp n yu nPN , respectively. To change the representation from B sa to B sp we define the following matrix.
. . , u be orthonormal bases for L 2 pr0, 1qq. The change of basis matrix U P Bp 2 pNqq between B sa and B sp is the infinite matrix with entries We will denote this matrix by U " rB sa , B sp s.
Notice in particular that since B sa and B sp are orthonormal, U " rB sa , B sp s is an isometry on 2 pNq and we can write y " U x.
Next let Ω " Ω m,N be a given multilevel random sampling scheme with |Ω| " m. We refer to N " N r as the sampling bandwidth of Ω (as discussed later, this will be chosen in terms of sampling bandwidth to ensure stable truncation of U ). Now define the matrix ?
and we use the slightly unusual notation C mˆ8 for the operators Bp 2 pNq, C m q. Due to the scaling factors 1{ ? p k we consider scaled noisy measurements where D is a diagonal matrix with the corresponding scaling factors found in H along the diagonal and e is the measurement noise.
Suppose that x is approximately ps, Mq-sparse with sparsity bandwidth M . It is tempting to form the finite matrix A " HP M P C mˆM and solve the minimization problem minimize }z} 1 subject to }Az´ỹ} 2 ď η.
However, note that the truncation of H to A introduces an additional truncation error HP K M x.

Indeed,
Ax´ỹ " HP K M x`e, and this poses a problem since for the above decoder we require η ě }HP K M x`e} 2 in order for P M x to be a feasible point. For some applications we might have a rough estimate of }e} 2 , but any estimate of }HP K M x} 2 would require a priori knowledge of x, the signal we are trying to recover. This is generally impossible. (We note in passing that there is some recent work [8] which derives CS recovery guarantees in the absence of feasibility of the target vector P M x, but the application of this work to the sparse in levels model is not clear).
To overcome this issue, we will introduce a data fidelity parameter K ě M and assume we know }e} 2 so that we can let η ą }e} 2 . Then there will always exits a K 1 ě M such that P K x lies in the feasible set tz P C K : }Az´ỹ} 2 ď ηu corresponding to the augmented matrix for all K ě K 1 . In practice (for the general case) it will also be impossible determine a sufficient value for K, but for fixed η ą }e} 2 there will always exist such a K. It should, however, be noted that there are special cases, such as Walsh sampling and wavelet recovery, where sufficient values for K are known; see Remark 4.9. This aside, as previously mentioned, we also now modify the optimization problem to include weights. Specifically, let M, s P N r be given sparsity levels and local sparsities respectively. For positive weights ω " pω 1 , . . . , ω r`1 q we define Notice that this weighted regularizer assigns constant weights on each sparsity level. With this in hand, our recovery procedure is minimize }z} 1,ω subject to }Az´ỹ} 2 ď η, with A as in (3.3) and η ě }Ax´ỹ} 2 .

The balancing property
We now discuss the relation between the sampling and sparsity bandwidths N and M . From generalized sampling theory [2] we know that we must choose N ě M to obtain a stable mapping between the first N sampling basis functions and the first M sparsity basis functions. The degree of stability for this solution will depend of the so-called balancing property: Note that the balancing property may not hold for any N ě M . However, it always holds for sufficiently large N (for fixed M ). Indeed, P M U˚P N U P M Ñ P M U˚U P M " P M in the operator norm, hence the balancing property holds with θ arbitrarily close to 1 for large enough N .
Below we shall see that this property will also affect our recovery guarantees, but it will be camouflaged as the quantity }G´1} 2 , where G " ? P M U˚P N U P M . This gives the following relation. ? P M U˚P N U P M be self-adoint and nonnegative definite. Then G is invertible and

G-adjusted Restricted Isometry Property in Levels (G-RIPL)
Our theoretical analysis requires a RIP-type property for the matrix HP M . However, as implied in the previous discussion, the finite matrix P N U P M P C NˆM (from which AP M is constructed) is not an isometry for any N ě M . In particular, unlike in finite dimensions EpP M H˚HP M q " P M U˚P N U P M " G 2 is not the identity. In order to handle this situation, we introduce the following generalization of the RIP: . . , M r q be sparsity levels and s " ps 1 , . . . , s r q be local sparsities. The s th G-adjusted Restricted Isometry Constant in Levels (G-RICL) δ s,M is the smallest δ ě 0 such that p1´δq}Gx} 2 2 ď }Ax} 2 2 ď p1`δq}Gx} 2 2 , @x P Σ s,M . If 0 ă δ s,M ă 1 we say that the matrix A satisfies the G-adjusted Restricted Isometry Property in Levels (G-RIPL) of order ps, Mq.
The G-RIPL is of course completely general and can be stated for any G. However, in the following we will let G " ? P M U˚P N U P M and show that the matrix A " HP K (or equivalently, HP M -note that Σ s,M consists of vectors z with P K M z " 0) satisfies the G-RIPL for this particular G.
First, however, we show that the G-RIPL implies uniform recovery. For this, we introduce the following notation: Notice in particular that for the choice ω " p1, . . . , 1, ω r`1 q we have S ω,s " s 1`. . .`s r and for the choice ω " ps´1 , ω r`1 q we have S ω,s " r. Finally, we let κpGq " }G} 2 }G´1} 2 denote the condition number of G.
Theorem 3.5. Let A P C mˆK , G P C MˆM with K ě M and let M, s P N r be given sparsity levels and local sparsities, respectively. Let ω P R r`1 be positive weights. Suppose AP M satisfies the G-RIPL of order pt, Mq with constant δ t,M ď 1{2 and Let η ě 0, x P C K , e P C m with }e} 2 ď η and set y " Ax`e. Then any solutionx of the optimization problem where C " 2p2`?3q{p2´?3q, D " 8 ? 2{p2´?3q and σ s,M pxq 1,ω " inft}x´z} 1,ω : z P Σ s,M u.
Notice that the condition on δ in the above theorem is fundamentally different from the condition found in Theorem 2.8. In the latter one requires δ 2s,M ă prp where α s,M " max k,l"1,...,r s k {s l is the sparsity ratio. Thus for sparsity levels where the local sparsities vary greatly, this bound will be unreasonably small.
In the above theorem we have removed this sparsity ratio term, by setting δ " 1{2, and require For the unweighted case this leads to a condition of the form t l ě 2 P 4κpGq 2 ps 1`. . .`s r q T which could be difficult to fulfill in practice, since each t l would have to be greater than the total sparsity of the signal. However, by considering the weights ω " ps´1 , ω r`1 q we obtain a condition of the form where t l is independent of s k for k ‰ l. This means that we can write the requirement as δ 2r4κpGq 2 rss,M ď 1{2, and ignore any dependence between the s-values, as was the problem in Theorem 2.8.

Sufficient condition for the G-RIPL
In Definition 2.9 we defined the local coherence µ k,l of an isometry U P C NˆN . We extend this to isometries U P Bp 2 pNqq in the exact same way This yields the following theorem.
Theorem 3.6 (Subsampled isometries and the G-RIPL). Let U P Bp 2 pNqq be an isometry, and let Ω " Ω N,m be an pN, mq-multilevel sampling scheme with r levels. Let M, s P N r be sparsity levels and local sparsities, respectively. Let , δ P p0, 1q and let 0 ď r 0 ď r, with m " m r0`1`¨¨¨`mr . Let s " s 1`¨¨¨`sr and L " r¨logp2mq¨logp2N and for k " r 0`1 , . . . , r then with probability at least 1´ , the matrix satisfies the G-RIPL of order ps, Mq with constant δ s,M ď δ.

Overall recovery guarantee
Theorem 3.5 and Theorem 3.6 yield the next results.
Let H P C mˆ8 be as in (3.1) and set A " HP K . Let x P 2 pNq, e 1 P C m and η ą 0. Set e " HP K K x`e 1 andỹ " Ax`e. Suppose (i) we choose M and N so that U satisfies the balancing property of order 0 ă θ ă 1, (iv) the m k 's satisfy m k " N k´Nk´1 for k " 1, . . . , r 0 and m k Á θ´2¨r¨pN k´Nk´1 q¨ˆr ÿ l"1 µ k,l s l˙¨L for k " r 0`1 , . . . , r. Then with probability 1´ any solutionx of the optimization problem where C " 2p2`?3q{p2´?3q and D " 8 ? 2{p2´?3q.
Suppose that x is exactly ps, Mq-sparse. Then the above theorem guarantees exact recovery of x via weighted 1 minimization subject to the corresponding measurement condition. We note in passing this measurement condition is optimal up to log factors, in the sense that it is the same of that of the oracle estimator based on a priori knowledge of supppxq. See [1].

Recovery guarantees for Walsh sampling with wavelet reconstruction
Having presented the abstract infinite-dimensional CS framework in full generality, the remainder of the paper is devoted to its application to the case of binary sampling with the Walsh transform with sparsity in orthogonal wavelet bases. We first describe the setup, before presenting the main recovery guarantees in Sections 4.3 and 4.4.

Walsh functions
For any number n P Z`" t0, 1, 2, . . .u there exits a unique dyadic expansion n " n 1 2 0`n 2 2 1`. . .`n j 2 j´1`¨¨ẅ here n j P t0, 1u for j P N. Similarly any x P r0, 1q can be written in its dyadic form as x " x 1 2´1`x 2 2´2`¨¨¨`x j 2´j with x j P t0, 1u for all j P N. For a dyadic rational number x this expansion is not unique, as one may use either a finite expansion, or an infinite expansion where x i " 1 for all i ě k for some k P N. In such cases we always consider the finite expansion. In practice this means that we have removed countably many singletons from r0, 1q.
Definition 4.1. Let n P Z`and x P r0, 1q. The Walsh function w n : r0, 1q Ñ t`1,´1u is given by w n pxq :" p´1q On the interval r0, 1q the Walsh function w n has n sign changes, n is therefore often denoted the frequency of w n . The 2 r first Walsh functions gives rise to the entries in the sequency ordered Hadamard matrix pV Had q i,j " w i´1 ppj´1q{2 r q where i, j " 1, . . . , 2 r . where "wh" is an abbreviation for Walsh-Hadamard.
Note that this is an orthonormal basis of L 2 pr0, 1qq.

Wavelet transform
Let φ : R Ñ R and ψ : R Ñ R be a orthonormal scaling function and wavelet [13], respectively, with minimal support, corresponding to an multiresolution analysis (MRA). Note that this could both be the classical "Daubechies wavelet" with a minimum-phase or "symlets" which are close to being symmetric, but with a larger phase [26,294]. Let φ j,k pxq :" 2 j{2 φp2 j x´kq and ψ j,k pxq :" 2 j{2 ψp2 j x´kq (4.2) denote the scaled and translated versions. A wavelet ψ is said to have ν vanishing moments if For for orthogonal wavelets with minimum support, the support depends on the number of vanishing moments. That is supppφq " supppψq " r´ν`1, νs. (4.3) While this system constitutes an orthonormal basis of L 2 pRq, in our case we require an orthonormal basis of L 2 pr0, 1qq. There exists several construction of wavelets on the interval, but we will only consider periodic extensions and the orthogonal boundary wavelets introduced by Cohen, Daubechies and Vial in [12], which preserves the number of vanishing moments. For wavelets on the interval we need to replace the 2ν wavelets/scaling functions intersecting the boundaries at each scale, with their corresponding boundary-corrected counterparts. We postpone the formal definition of periodic and boundary wavelets until we need it, in the proof sections. But to simplify the notation let are either a periodic wavelet/scaling function or the boundary wavelet/scaling functions introduced in [12]. For the former extension we say that φ s j,k , s P t0, 1u "originate from a periodic wavelet" while for the latter we say that it "originate from a boundary wavelet".
We will throughout assume J 0 P Z`satisfies 2 J0 ě 2ν for ν ě 2 and J 0 ě 0 for ν " 1. This will ensure that there exits at least one k P t0, . . . , 2 j´1 u such that supppφ j,k q " supppψ j,k q Ď r0, 1q for all j ě J 0 . Definition 4.3. For a fixed number of vanishing moments ν, minimum wavelet decomposition J 0 and a boundary extension which is either periodic or boundary wavelets, let φ s j,k be the corresponding wavelets and scaling functions. We define

Recovery guarantees
From Section 3 there are four unknown factors depending on U which need to be estimated. These are the local coherences µ k,l , the norm }HP M K } 1Ñ2 where H is given by (3.1), the condition number κpGq " }G} 2 }G´1} 2 and the factor }G´1} 2 found in condition (3.10).
For the two latter factors we have G " ? P M U˚P N U P M . Furthermore we know that }G} 2 ď 1 since U is an isometry. In practice we therefore only need to determine an upper bound }G´1} 2 and from Lemma 3.3 we know that }G´1} 2 ď 1{ ? θ, where 0 ă θ ă 1 is the balancing property constant. In other words, it suffices to determine when the balancing property holds with a given θ.
The following three propositions estimate these quantities for the case U " rB wh , B J0,ν wave s. Proposition 4.4. Let U " rB wh , B J0,ν wave s. For each θ P p0, 1q, there exits a constant q θ ě 0, such that whenever N " 2 k`q θ ě 2 k " M then U satisfies the balancing property of order θ for all k P N.
Note that Proposition 4.4 is a consequence of Theorem 1.1 in [20].
Let Ω " Ω m,N be a multilevel random sampling scheme, and let H be as in (3.1). Then We can now present the two main theorems in this section. We point out that these are only valid for ν ě 3 vanishing moments. For ν " 1, the corresponding wavelet is the Haar wavelet, and will be considered in the next subsection. For ν " 2, the coherence of U " rB wh , B J0,2 wave s does not decay as fast as for the other wavelets. Whether this is because our coherence bounds are not sharp enough for this wavelet or if it is because the coherence of U " rB wh , B J0,2 wave s actually decays more slowly is not known. We do, however, present some numerics in Section 6.5 which indicate that it is potentially the latter.
Let H P C mˆ8 be as in (3.1) and set A " HP K . Let x P 2 pNq, e 1 P C m and η ą 0. Set e " HP K K x`e 1 andỹ " Ax`e. Suppose (i) we choose q " q θ as in Proposition 4.4 so that U satisfies the balancing property of order 0 ă θ ă 1, (ii) we choose η ě }e 1 } and K so that }HP K K x} 2 ď η 1 , (iii) the weight ω r`1 satisfies (iv) the m k 's satisfy m k " N k´Nk´1 for k " 1, . . . , r 0 and m k Á θ´2¨r¨2 q maxtk`1´r,0uˆr ÿ l"1 2´| k´l| s l˙¨L for k " r 0`1 , . . . , r.
Remark 4.9. Note that the second condition (ii) can be guaranteed using Proposition 4.6. Indeed, it suffices for K to satisfy Hence, given any a priori estimates on the decay of the coefficients x (such as in the case of wavelets), one can use this to determine a suitable K.

Uniform recovery for Haar wavelets
Below we shall see that for the Haar wavelet, P N U P N will be an isometry for N " 2 r where r P N. This can also be seen from Figure 2, where U " rB wh , B J0,ν wave s is perfectly block diagonal for ν " 1. This means that the G-RIPL, reduces to the I-adjusted RIPL, or simply the RIPL, which we know from the finite dimensional case. Notice in particular that we also avoid any considerations where K ą M " N as above, since HP K M " 0.
Proposition 4.10. Let U " rB wh , B J0,1 wave s and let N " 2 k , for some k P N with k ě J 0`1 . Then P N U P N is an isometry on C N . Proposition 4.11. Let U " rB wh , B J0,1 wave s and let M " N " r2 J0`1 , . . . , 2 J0`r s be sparsity and sampling levels, respectively. Then the local coherences of U are It is now straightforward to derive the following: Theorem 4.12. Let U " rB wh , B J0,1 wave s and let M " N " r2 J0`1 , . . . , 2 J0`r s be sparsity and sampling levels. Let s P N r be local sparsities and m P N r be local sampling densities. Let , δ P p0, 1q and 0 ď r 0 ď r. Letm " m r0`1`. . .`m r and s " s 1`. . .`s r . Suppose that the m k 's satisfies m k " N k´Nk´1 for k " 1, . . . , r 0 and m k Á δ´2s k`r logp2mq logp2N q log 2 p2sq`logp ´1 q˘, for k " r 0`1 , . . . , r.
Proof. Using Proposition 4.10 we know that P N U P N is an isometry. Thus inserting the local coherences from Proposition 4.11 into (2.4) in Theorem 2.10 gives to the result.
Theorem 4.13. Let U " rB wh , B J0,1 wave s and let M " N " r2 J0`1 , . . . , 2 J0`r s be sparsity and sampling levels. Let s P N r be local sparsities, ω " ps 1{2 1 , . . . , s 1{2 r q be weights and m P N r be local sampling densities. Let P p0, 1q and let 0 ď r 0 ď r. Let m " m 1`. . .`m r ,m " m r0`1`¨¨¨`mr and s " s 1`. . .`s r . Suppose we sample m k " N k´Nk´1 for k " 1, . . . , r 0 and m k Á r¨s k¨`r logp2mq logp2N q log 2 p2sq`logp ´1 q˘, for k " r 0`1 , . . . , r. Let H P C mˆ8 be as in (3.1) with A " HP M . Let x P 2 pNq and e P C m with }e} 2 ď η for some η ě 0. Setỹ " Ax`e. Then any solutionx of the optimization problem Proof. Proposition 4.10 gives G " ? P M U˚P N U P M " ? I " I. Next notice that S ω,s " r and that P M x P tz P C M : }Az´ỹ} 2 ď ηu since }HP K M } " 0. Using Theorem 3.5 we see that we can guarantee recovery of ps, Mq-sparse vectors, if A satisfies the RIPL with constant δ t,M ď 1{2, where t l " mintM l´Ml´1 , 8rs l u. Using Theorem 4.12 gives the result.

Proof of results in Section 3
When deriving uniform recovery guarantees via the RIP, it is typical to proceed as follows. First, one shows that the RIP implies the so-called robust Null space Property (rNSP) of order s (see Def. 4.17 in [16]). Second, one the shows that the rNSP implies stable and robust recovery. Thus the line of implications reads (RIP) ùñ (rNSP) ùñ (uniform recovery).
A similar line of implications holds for the RIPL and the corresponding robust Null Space Property in levels (rNSPL); see Def. 3.6 in [7]).
Both of the recovery guarantees for matrices satisfying the rNSP and rNSPL consider minimizers of the unweighed quadratically-constrained basis pursuit (QCBP) optimization problem. In our setup we consider minimizers of the weighted QCBP. We have therefore generalized the rNSPL to what we call the weighted robust null space property in levels.
For the sufficient condition for the G-RIPL in Theorem 3.6, the proof follows along similar lines as in [23]. We only sketch the main differences here.
Definition 5.1 (weigthed rNSP in levels). Let M, s P N r be sparsity levels and local sparsities, respectively. For positive weights ω P R r`1 , we say that A P C mˆM satisfies the weighted robust Null Space Property in Levels (weighted rNSPL) of order ps, Mq with constants 0 ă ρ ă 1 and γ ą 0 if for all x P C M and all Θ P E s,M .
Applying Young's inequality ab ď 1 2 a 2`1 2 b 2 , we obtain Hence We now use the weighted rNSPL to get To complete the proof, we use the inequality }v Θ c } 1,ω ď }v} 1,ω .

Weighted rNSPL implies uniform recovery
Theorem 5.4. Let M, s P N r be sparsity levels and local sparsities, respectively, and let ω P R r`1 be positive weights. Let x P C K , with K ą M and e P C m with }e} 2 ď η. Set y " Ax`e. Let A P C mˆK and suppose that AP M satisfies the weighted rNSP in levels of order ps, Mq with constants ρ " ? 3{2 and γ ą 0. If then any solutionx of the optimization problem where C " 2p2`?3q{p2´?3q and D " 8{p2´?3q.

G-RIPL implies weighted rNSPL
Theorem 5.5. Let A P C mˆM and let G P C MˆM be invertible. Let M P N r be sparsity levels, s, t P N r be local sparsities and let ω P R r be positive weights. Suppose that A satisfies the G-RIPL of order pt, Mq with constant 0 ă δ t,M ă 1, where Then A satisfies the weighted rNSP in levels of order ps, Mq with constants 0 ă ρ ă 1 and γ " ?
Proof. Let x P C K be such that P K M x " 0 and let Θ " Θ 1 Y¨¨¨Y Θ r , where Θ l is the set of the largest s l indices of P Set ∆ " tl P t1, . . . , ru : t l ă M l´Ml´1 u and notice that T l,k " H for l P t1, . . . , ruz∆ and k ě 1. Thus for k ě 2 we get This results in which establishes the weighted rNSPL of order ps, Mq with 0 ă ρ ă 1 and γ " ? 2}G´1} 2 .

Proof of Theorem 3.6
Proof of Theorem 3.6. We recall that U P Bp 2 q is an isometry and that A " » ----

1{
? p r P Ωr U P M fi ffi ffi ffi fl and m " m 1`. . .`m r . Note that }Ax} 2´} Gx} 2 " xpA˚A´G˚Gqx, xy, and therefore δ " sup Notice also that p k " 1 and Ω k " tN k´1`1 , . . . , N k u for k " 1, . . . , r 0 . Next notice that the matrix P Ω k can be written as where te i u 8 i"1 is the standard basis on 2 pNq. It now follows that where X k,i are random vectors given by X k,i " 1 ? p k P M U˚e t k,i . Note that the X k,i are independent, and also that where G P C MˆM is non-singular by assumption. Let In particular, E pδ s,M q ď δ{2, provided where C 2 ą 0 is a constant. Using this, Talagrand's theorem and using the fact that }P N U P M } 2 ď }U } 2 " 1 (see [23,Sec. 4.3]) we deduce that › › 2 2 δ´2 logp ´1 q ď 1. Combining this with (5.23) and (5.24) now completes the proof.

Proof of Corollary 3.7 and Lemma 3.3
Proof of Corollary 3.7. We must ensure that all the conditions are met to be able to apply Theorem 3.5 with P K x.
Next we must ensure that AP M satisfies the G-RIPL of order pt, Mq with δ t,M ď 1{2 where According to Theorem 3.6 this occurs if the m k 's satisfies condition pivq. The error bounds (3.7) and (3.8) now follows directly from Theorem 3.5.
Proof of lemma 3.3. First notice that the balancing property is equivalent to requiring where σ M pP N U P M q is the M th largest singular value of P N U P M . Indeed, since U is an isometry, the matrix P M´PM U˚P N U P M is nonnegative definite, and therefore This gives (5.26). Next let G " ? P M U˚P N U P M and notice that σ M pGq " σ M pP N U P M q. This gives }G´1} 2 " 1{σ M pGq ď 1{ ? θ.

Proof of results in Section 4
In Section 4 we found concrete recovery guarantees for the Walsh sampling and wavelet reconstruction, using the theorems in Section 3. The key to deriving Walsh-wavelet recovery guarantees boils down to estimating the quantities µ k,l , ||HP M K || 1Ñ2 and ||G´1|| 2 ď 1 ? θ . All of these quantities depend directly U " rB wh , B J0,ν wave s, and to control them we will have to estimate how the entries of U changes for varying n, j, k and s. We will therefore start this section by setting up notation for wavelets on the interval and stating some useful properties of Walsh functions. Then in Section 6.3 and 6.4 we will estimate µ k,l , followed by a discussion of the sharpness of this estimate for ν " 2 in Section 6.5. We will then finish in Section 6.6 by estimating ||HP M K || 1Ñ2 , show how θ scales for varying M and N , and prove Theorem 4.7 and 4.8.
Next we have the boundary wavelet basis with ν vanishing moments. This wavelet basis consists of the same interior wavelets as the periodic basis, but with 2ν boundary scaling and wavelet functions.
As for the interior functions we also define the scaled versions as 3) The names 'left' and 'right' corresponds to the support of these functions. That is supp φ left j,k " r0, 2´jpν`kqs supp φ right j,k " r2´jp2 j´ν´k q, 1s for k " 0, . . . , ν´1.
In the following we shall see that all of our results holds for both periodic and boundary wavelets, but their treatment in some of the proofs differs slightly. To make the treatment as unified as possible we make the following definition. Definition 6.1. We say that φ s j,k , s P t0, 1u "originates from a periodic wavelet" if φ 0 j,k :" for k P Λ ν,j,left φ j,k for k P Λ ν,j,mid φ per j,k for k P Λ ν,j,right , φ 1 j,k :" for k P Λ ν,j,left ψ j,k for k P Λ ν,j,mid ψ per j,k for k P Λ ν,j,right .
We say that φ s j,k "originates from a boundary wavelet" if φ 0 j,k :" for k P Λ ν,j,left ψ j,k for k P Λ ν,j,mid ψ right j,2 j´1´k for k P Λ ν,j,right .
With these functions defined now for both boundary extensions, the definition of B J0,ν wave is also clear. Next we make a note on the regularity of these orthogonal wavelets.
Definition 6.2. Let α " k`β, where k P Z`and 0 ă β ă 1. A function f : R Ñ R is said to be uniformly Lipschitz α if f is k-times continuously differentiable and for which the k th derivative f pkq is Hölder continuous with exponent β, i.e. |f pkq pxq´f pkq pyq| ă C|x´y| β , @x, y P R for some constant C ą 0.
In particular the Daubechies wavelet with 1 vanishing moment (i.e., the Haar wavelet) is not uniformly Lipschitz as it is not continuous, whereas for ν ě 2 we have the constants found in table 1 [13,239]. For large ν, α grows as 0.2ν [26,294]. Also note that each of the boundary functions φ left k , φ right k and ψ left k , φ right k are constructed as finite linear combinations of the interior scaling function φ and wavelet ψ. Thus all of these boundary functions has the same regularity as φ and ψ.  6.2 Properties of Walsh functions Definition 6.3. Let x " tx i u 8 i"1 and y " ty i u 8 i"1 be sequences consisting of only binary numbers. That is x i , y i P t0, 1u for all i P N. The operation ' applied to these sequences gives For two binary numbers x i , y i P t0, 1u, we let x i ' y i " |x i´yi |.
Proposition 6.4. For j, m, n P Z`and x, y P r0, 1q, the Walsh function satisfies the the following properties w n px ' yq " w n pxqw n pyq (6.6) w n p2´jxq " w tn{2 j u pxq (6.7) Proof. Equation (6.6) and (6.5) can be found in any standard text on Walsh functions e.g., [18], whereas the last follows by inserting j zeros in front of x's dyadic expansion.
6.3 Bounding the inner product |xφ s j,k , w n y| The entries in U " rB wh , B J0,ν wave s, consists of xφ s j,k , w n y for different values of j, k, s and n. Thus in order to determine the local coherences we need to find an upper bound of this inner product. Next we derive such an bound for ν ě 2 vanishing moments and discusses its sharpness. For ν " 1 we determine the magnitude of each matrix entry explicitly. Lemma 6.5. Let w n P B wh and let φ s j,k P B J0,ν wave for ν ě 2. For j ě J 0 , s P t0, 1u and k P Λ j we haveˇˇ@ t0, . . . , ν`k´1u for k P Λ ν,j,left ; t´ν`1, . . . , ν´1u for k P Λ ν,j,mid ; tk´ν`1, . . . , 2 j´1 u for k P Λ ν,j,right .
if φ s j,k originates from a boundary wavelet and if φ s j,k originates from a periodic wavelet. Proof. First notice that for any x P r0, 1q we have Next, we only consider the interior wavelets φ s j,k i.e. k P Λ ν,j,mid . For k P Λ ν,j,left Y Λ ν,j,right , we need to handle the two cases where φ s j,k orignates from a periodic and boundary wavelet seperately. The arguments/calculations for the two different boundary extensions are analogous. Also, both of these extensions will have support less than 2ν.
Theorem 6.7. Let φ s l,t P B J0,ν wave with ν ě 3 and let w n P B wh . For l ě J 0 and 2 k ď n ă 2 k`1 with k P Z`, we have | @ φ s l,t , w n D | 2 À 2´k2´| l´k| for all t P Λ l and s P t0, 1u. For n " 0 the bound hold with k " 0.
Proof. To obtain the bound above we will combine Lemma 6.5 and Lemma 6.6. We start by arguing that φ s l,t have the same regularity regardless of boundary extension. Let a P Γ t where Γ t is as in lemma 6.5.
If φ s l,t originates from a periodic wavelet, φ s 0,´a | r0,1q , will have Lipschitz regularity α ą 0, since both φ and ψ have this regularity. Next if φ s l,t originates from a boundary wavelet and t P Λ ν,l,mid , φ s 0,´a | r0,1q will have Lipschitz regularity α, by the same argument as above. If t P Λ ν,l,left Y Λ ν,l,right we know from the construction of the boundary functions [12] that these are finite linear combinations of φ l,t and ψ l,t . These function will therefore posses the same regularity α as the interior function.
Theorem 6.8. Let w n P B wh and let φ s l,t P B J0,1 wave for l ě 0 and t P Λ l . Then Proof. These equalities can be found in either [5] or [30].
6.4 Proof of Proposition 4.5, 4.10 and 4.11 Using the above results we are now able to determine the local coherences of U " rB wh , B J0,ν wave s.

About the sharpness of the local coherence bounds
As can be seen from Proposition 4.11, the coherence bounds for ν " 1 are sharp. However, for ν ě 2, we have not discussed their sharpness. In fact, none of the results in this paper consider the case for ν " 2 vanishing moments. The reason for this is that these wavelet have a Lipschitz regularity α « 0.55, which means that the bound in Theorem 6.7 would have less rapid decay if we had included these wavelets in the theorem. To simplify the presentation we have chosen to exclude them. We will argue that Theorem 6.7 does not seem to extend to wavelets with ν " 2 vanishing moments. Let M " N " r2 J0`1 , . . . , 2 J0`r s and U " rB wh , B J0,ν wave s for ν ě 2. Notice that setting ν " 2 does only affect the local coherence estimates µ k,l for k ě l. For k ă l, the local coherences are unaffected by the regularity of the wavelet. This follows from Lemma 6.5, by setting |Wφ s p¨`lqp0q| « 1. Next consider the case where k ě l, then Theorem 6.7 suggests that µ k,l {µ k`1 « 4 for ν ě 3.

Proof of remaining results in Section 4
Proof of Proposition 4.4. This proposition is a consequence of Theorem 1.1 in [20]. Let S N " tw n : n " 0, . . . , N´1u and R M be the M first function in B J0,ν wave . The subspace cosine angle between S N and R M is defined as }P S N f } where ωpR M , S N q P r0, π{2s, and P S N is the projection operator onto S N . As both B wh and B J0,ν wave are orthonormal bases, the synthesis and analysis operators are unitary. We therefore have Hence if U satisfies the balancing property of order θ P p0, 1q for N and M , then 1{ cospωpR M , S N qq ď 1{θ, where 1{θ ą 1. Next for M P N and γ ą 1 we define the stable sampling rate as ΘpM, γq " minpN P N : 1{ cospωpR M , S N qq ă γq.
Rearranging the terms we see that if N , M satisfies the stable sampling rate of order γ " 1{θ ą 1 then U satisfies the balancing property of order θ for N and M . Theorem 1.1 in [20] states that for M " 2 r , r P N and for all γ ą 1 there exists a constant S γ ą 1 (dependent on γ), such that whenever N ě S γ M , then 1{ cospωpR M , S N qq ă γ. Moreover, we have the relation ΘpM, γq ď S γ M " OpM q. Hence if q " P log 2 S 1{θ T we see that the proposition hold with N " 2 k`q ě S 1{θ 2 k ą 2 k " M .
Proof of Theorem 4.8. The theorem is identical to Corollary 3.7, except that we have fixed M and N. The concrete values for these have been inserted in condition pivq together with the local coherences µ k,l . The computation of this can be found in the proof above.