Ideal denoising within a family of tree-structured wavelet estimators

: We focus on the performances of tree-structured wavelet esti- mators belonging to a large family of keep-or-kill rules, namely the Vertical Block Thresholding family . For each estimator, we provide the maximal functional space (maxiset) for which the quadratic risk reaches a given rate of convergence. Following a discussion on the maxiset embeddings, we identify the ideal estimator of this family, that is the one associated with the largest maxiset. We emphasize the importance of such a result since the ideal estimator is diﬀerent from the usual (plug-in) estimator used to mimic the performances of the Oracle. Finally, we conﬁrm the good per- formances of the ideal estimator compared to the other elements of that family through extensive numerical experiments.


Introduction
Wavelet methods are known to be powerful in nonparametric estimation of functions. Indeed, the information of a function is localized in a few large wavelet coefficients for a wide range of function classes. This is the key-point to understanding why Hard and Soft thresholding methods perform well. These methods introduced by Donoho and Johnstone [13] consist in estimating the function by using the empirical wavelet coefficients which are larger than a chosen threshold value. In particular, these estimators were shown to be near optimal over 830 F. Autin et al. Besov spaces while they are adaptive for the regularity parameter (see Donoho and Johnstone [13,14]). As mentioned by Autin [3] such thresholding rules are elitist in the sense that small empirical wavelet coefficients are not used in the reconstruction of the function.
Recent developments in wavelet thresholding have shown that elitist procedures can be outperformed in both theoretical and practical way by methods which refine the choice of the wavelet coefficients to be used in the reconstruction. This refined choice makes use of information from neighbored coefficients, e.g., block thresholding methods (see among others Cai [9], Autin [3,5]) or impose that the empirical coefficients used for the reconstruction of the signal are arranged over a rooted connected tree (see Baraniuk [8], Autin [4]). We denote the latter as Tree Structured Wavelets (TSW) estimators. Interest in TSW already appeared in the works of Donoho [14] and Engel [15,16]. In particular they pointed out the connection between TSW and CART. TSW have been proved useful in curve denoising (see among others Jansen [19], Lee [20], Autin [4]) but their interest goes beyond as they furnish specific abilities to be used in signal processing (Shapiro [23], Cohen et al. [10]), edge detection (Sun et al. [24]), construction of statistical models in the coefficient domain (Freyermuth et al. [17]). . . This paper is not in the line of comparing TSW to other well established methods in curve denoising. Its aim is to suggest a formal treatment of an algorithm that is closely related to the dyadic CART and to emphasize an important aspect about the selection of the ideal procedure among a 'natural' family of TSW estimators. The family of estimators that we will consider includes as special cases two popular TSW estimators, the CART-like estimator obtained by model selection (see Donoho [14] and Engel [15]) and the Hard Tree estimator (see Autin [4])).
The Figures 1-4 show an example of a reconstruction of the Blip function using these methods (defined in Section 3) and the associated wavelet coefficient magnitudes (the darker, the larger the coefficient magnitude).
Looking at the positions of the large wavelet coefficients in the Figure 1, we notice a hierarchical structure between them. In particular, there are large wavelet coefficients that persist across scales at the location of the singularity. The two methods of reconstruction give estimators in the Figures 3 and 4 which appear to be close to the target function. Note that the sets of empirical wavelet coefficients used by the two methods are embedded (see Proposition 3.1). In particular, the cardinality of the set of empirical coefficients used in the reconstruction of the CART-like estimator ( Figure 3) is smaller than the one of the Hard Tree (Figure 4), quantitative results of section 6 support this remark. These facts will be discussed and interpreted throughout the paper.
Donoho [14] proves that estimation under tree constraints can be solved by a CART-like algorithm. A Tree-Oracle estimator is obtained after a recursive-perlevel method based on the comparison of the l 2 -mean of vertical blocks of the true wavelet coefficients with the standard deviation. This is the best possible tree-structured estimator minimizing the L 2 −risk which is unknown in practice but its performances can be mimicked by plugging-in observed values of the wavelet coefficients and adjusting the threshold value upwards to account for Ideal tree-structured estimator 831 the noise. This estimator is proven to be near-minimax and to perform well in practice. However, in this paper, adopting the maxiset approach, we show that we should not compare local ℓ 2 -norms of empirical wavelet coefficients with the threshold but rather local ℓ ∞ -norms.   To reach this goal, that is the main result of this paper, we first introduce in Section 3 a general family of TSW estimators so-called Vertical Block Thresholding (VBT) which includes the two previous estimators as special cases. Then, we compute the set of all the functions well estimated by each estimator in that family. Namely, we consider the maxiset approach introduced by Cohen et al [11]. Its basics are presented in Section 4. This theory is applied in Section 5 to find the ideal estimator of the VBT family, that is the one for which the set of well-estimated functions is the largest functional space. The main result of our paper is expressed in Theorem 5.1 and its Corollary 5.1. Section 6 proposes numerical experiments to confirm the superiority of the ideal estimator using as a benchmark the informative results obtained by the Tree-Oracle estimator. Finally after brief conclusive remarks in Section 7, Section 8 presents the proofs of our main results.

Wavelet setting and model
Let us consider a compactly supported wavelet basis of L 2 ([0, 1]) with V vanishing moments (V ∈ N * ) which has been previously periodized {φ, ψ jk , j ∈ N, k ∈ {0, . . . , 2 j − 1}}. Examples of such bases are given in [12]. Any function f ∈ L 2 ([0, 1]) can be written as follows: The coefficient α and the components of θ = (θ jk ) jk are respectively the scaling/wavelet coefficients of f . They correspond to the L 2 -scalar products between f and the scaling/wavelet functions φ and ψ jk . We consider the sequential version of the Gaussian white noise model: we dispose of observations of these coefficients which are assumed to be realizations of independent random variables: where ξ, ξ jk are i.i.d. N (0, 1), 0 < ǫ < 1 is supposed to be the noise level, and where the sequence (θ jk ) j,k is sparse, meaning that only a small number of large coefficients contain nearly all the information about the signal. That motivates the use of keep-or-kill estimators, for which we recall the Hard thresholding estimator:f where S = (j, k) ; j ∈ N, j < j λǫ ; 0 ≤ k < 2 j ; |θ jk | > λ ǫ . If S is non empty, it forms an unstructured set of indices of 'large' wavelet coefficients (in the sequel, by 'large' coefficients, we understand those which belong to S). Here, • j λ is the integer such that 2 −j λ ≤ λ 2 < 2 1−j λ (0 < λ < 1). For λ ǫ < 1, j λǫ − 1 is the finest level up to which we consider the empirical wavelet coefficients to reconstruct the signal f . This term by term thresholding does not take into account the information that give us the clusters of wavelet coefficients that we observed in the Figure 1. But this knowledge has the practical application that, on the one hand, we would not use in the reconstruction a large isolated wavelet coefficient because it is not likely to be part of the signal; on the other hand, a small coefficient in the neighborhood of large coefficients would be kept. This motivates the use of refined thresholding methods such as the tree-structured wavelets (Autin [4] and Baraniuk [8]) which we describe in the next section.

Tree-structured wavelet estimators
Tree-structured wavelet (TSW) estimators are based on the hierarchical interpretation of the wavelet expansion (1). The periodized wavelets {ψ jk } jk are arranged over a nested multiscale structure such that the support of each ψ jk contains the supports of ψ j+1,2k and ψ j+1,2k+1 . This induces a hierarchy among the wavelet coefficients which can be represented over a binary tree rooted in (0, 0) (see Figure 5). Hence, at the location of a singularity in the signal, we observe the persistence of large wavelet coefficients over all scales (see Figure 1). Therefore, considering the wavelet coefficients as a multiresolution sequence provides additional information which we aim to benefit from by imposing a tree/hereditary constraint. The hereditary constraint requires that the set of non zero wavelet coefficients after thresholding forms a connected rooted subtree. In other words, it cannot include an empirical wavelet coefficient unless all its ancestors (defined in equation (4) below) are large.
We denote as T J the binary tree of depth J for which the nodes are the couples of indices (j, k), 0 ≤ j < J, k ∈ {0, . . . , 2 j − 1} (see the Figure 5). For any couple of indices (j, k), following Engel [16], we define the set which contains: • its ancestors where ⌈x⌉ denotes the smallest integer smaller than or equal to x; • its descendants C(j, k) = {(j, k) , (j + 1, 2k) , (j + 1, 2k + 1) , Note that to each node of indices (j, k) correspond 2 j ′ −j descendants at levels j ′ (j ≤ j ′ < J) and j + 1 ancestors.
Remark 2.1. When using smooth wavelets, the presence of an edge 'generates' several large wavelet coefficients at each scale, due to the overlapping supports of the wavelets. This idea is the leitmotiv of block thresholding methods (intrascales), but could also be applied to TSW. In such a case, the heredity constraint would mean that a node at the scale j in the tree have more than two descendants at the scale j + 1 (see Baraniuk [8], Averkamp and Houdré [7]). In this paper, we consider the situation where the ancestors have two descendants and therefore, we naturally associate binary trees to wavelet coefficient sequences.
Let us now introduce the definition of a tree-structured estimator in our setting.
Definition 2.1. We call tree-structured estimator of a signal f satisfying (1) any keep-or-kill estimatorf where the set of the indices T satisfies the hereditary constraint formulated in Engel [15], that is, if (j, k) is in T then all its ancestors are in T .
In the sequel we denote by |T | the cardinality of the tree T , i.e., the number of active wavelet coefficients kept in the estimatorf T . Analogously to the Hard thresholding estimator defined in (3), we only use the empirical wavelet coefficients on levels smaller than j λǫ .
Donoho [14] used the Oracle approach to propose a tree-structured near optimal estimator. His idea was to find a tree-structured estimator which mimics the optimal risk R ǫ (f ) only attained by the "Tree-Oracle", that is where the minimum is taken over all the tree-structured estimators. Donoho [14] showed that the solution of this optimization problem under a tree constraint has an inheritance property and therefore can be solved by a CART-like algorithm applied to the true wavelet coefficients using ǫ as the threshold value. In the sequel T O stands for the set of coefficients selected by the Tree-Oracle. In practice,f O =αφ + (j,k)∈T Oθjk ψ jk is not available. Donoho [14] proposed to consider the estimatorf cart which minimizes the empirical complexity, that iŝ Furthermore it was shown that the risk off cart is of the same order as the optimal risk up to a logarithmic term. Precisely, for m large enough, there exists a constant K > 0 not depending on ǫ such that for any f ∈ L 2 ([0, 1]):

Vertical block thresholding estimators
Let us now define a general Vertical Block Thresholding (VBT) estimatorf p , for any 1 ≤ p ≤ ∞, as follows: . For given 0 < λ < 1, 1 ≤ p ≤ ∞ and any set of real numbers θ jk , 0 ≤ j < j λ , 0 ≤ k < 2 j we define the sets of indices, E jk (θ, λ), for any (j, k), iteratively as follows: • For j = j λ − 1 and for any k, • For any 0 ≤ j < j λ − 1 and any k, we put The (λ, p)-VBT-method is illustrated on an example in the Appendix 8.1. For any real valued p ∈ [1, ∞], it is associated to the following estimator: where T p is the set of empirical wavelet coefficients used in the reconstruction following the VBT method based on ℓ p -norms. We encourage the reader to check that for p = 2 (resp. p = ∞) the estimator f p is the CART-like estimator (resp. the Hard Tree estimator). These estimators have an interesting interpretation using the terminology of wavelet thresholding. At each node (j, k), we consider the coefficient at (j, k) and those which survive the previous step (i.e., at scale j + 1). They form a connected subtree F jk (θ, λ) of C (j, k) rooted to (j, k). The decision to keep-or-kill this block of coefficients depends on its ℓ p -mean which is compared with the threshold λ ǫ . We remark that unlike other block thresholding methods there is no need for controlling the size of the blocks by any additional parameter.
From now on, we will study the performance of these VBT estimators to address the following question: is the ℓ 2 -norm the best choice to consider amonĝ f p estimators (1 ≤ p ≤ ∞)? In the next sections we use the maxiset approach to prove that the answer is NO.
Define the Vertical Block Thresholding family (VBT ǫ ) as At first glance, as 1 ≤ p ≤ ∞ is real-valued, this family of estimators VBT ǫ seems to be uncountable. But it is not since the estimators are clearly treestructured. More precisely, Proposition 3.1. For any 1 ≤ p ≤ q and for given λ ǫ , 1. T p and T q constitute trees of indices, 2. T p ⊆ T q , 3. T ∞ is the smallest tree (in terms of cardinality) which contains all 'large' empirical wavelet coefficients.
According to the previous proposition, we deduce that VBT ǫ is a family of tree-structured estimators with embedded trees. The larger p, the bigger the tree.

Maxiset approach
In this section we recall the maxiset approach. The maxiset point of view has been proposed by Cohen et al. [11] to measure the performance of estimators. For a given estimatorf and a chosen sequence v = (v ǫ ) ǫ tending to 0 when ǫ goes to 0, this approach consists in providing the set of all the functions (maxiset) for which the rate of convergence of the quadratic-risk off is at least as fast as v.
In this setting, the functional space G will be called maxiset off for the rate of convergence v if and only if the following property holds: From now on we shall adopt the following notation: . Hence, the maxiset approach appears to be more optimistic than the minimax one. The following scheme illustrates this idea.
The maxiset setting allows to compare efficiently different procedures. This approach lies on the fact that the larger the maxiset, the better the procedure. Following Kerkyacharian and Picard [21,22] and Autin [3], this way to measure the performance of procedures is often successfully applicable to discriminate procedures that are equivalent in the minimax sense, and to give theoretical explanations for some phenomena observed in practice (see Section 6).

Functional spaces: Definitions and embeddings
In this paragraph, we characterize the functional spaces which shall appear in the maxiset study of our estimators. Recall that, for later use of these functional spaces, we shall consider wavelet bases with V vanishing moments. Besov spaces naturally appear in estimation problems (see Autin [3] and Cohen et al. [11]). These spaces characterize the functions for which the energy of wavelet coefficients on levels larger than J (J ∈ N) is decreasing exponentially in J. We recall some properties of embeddings. Let For an overview of these spaces, see Härdle et al. [18].
Let us now define a new function space which is the key to our results: Definition 5.2. Let 0 < r < 2 and 1 ≤ p ≤ ∞. We say that a function f belongs to the space W r,p if and only if: First, note that the larger r, the larger the functional space; second, in contrast to weak Besov spaces (see Cohen et al. [11] for an explicit definition) which appear in the maxiset results for Hard and Soft thresholding estimators, the spaces W r,p (0 < r < 2) are not invariant under permutations of wavelet coefficients within each scale. This property makes them able to distinguish functions according to the "clustering properties" of their wavelet coefficients. These functional spaces are quite large as suggested by our following Proposition 5.1.
Our following Proposition 5.2 shows that, for the same parameter r (0 < r < 2), the functional spaces W r,p (p ≥ 1) are embedded. The larger p the larger W r,p . Moreover, in Theorem 5.1, the intersections of function spaces appearing in equation (9) below are shown to be directly related to the maxisets of the estimatorsf p ∈ VBT ǫ . Proposition 5.2. For any 1 ≤ p < q and any 0 < r < 2, we have the following embeddings of spaces:

Maxiset results
In this paragraph we provide the maximal space (maxiset) of anyf p ∈ VBT ǫ associated with the rate λ 4s 1+2s ǫ (s > 0). This corresponds to the optimal minimax rate over Besov spaces B s γ,∞ , s > 1 γ − 1 2 under the L 2 -risk with a logarithm term. In the maxiset context, this is a traditional choice which has nothing to do with a price to pay for adaptivity. It gives a maxiset that is simpler to interpret than the one we would get using the exact optimal minimax rate. And, for our purpose, this is unnecessary complications since the choice of the rate will not make any difference in the identification of the maxiset-ideal method among the (λ, p)-VBT family.
Theorem 5.1. Let s > 0, 1 ≤ p ≤ ∞ and λ ǫ = m ǫ log(ǫ −1 ). For any m ≥ 4 √ 3, we have the following equivalence: that is to say, using the maxiset notation, M S(f p , (λ 4s 1+2s 1+2s ,p . Note that these maxisets are large functional spaces since from Proposition 5.1 we deduce that the functional space B s 1+2s 2,∞ ∩ W 2 1+2s ,p contains the space B s γ,∞ for any γ ≥ min(2, s −1 ). Hence this maxiset contains many functions that cannot be reconstructed by linear procedures at the rate λ We now state the main result of the paper through the following corollary.
, thenf ∞ is the ideal estimator in the maxiset sense among the VBT ǫ family.
Proof. Theorem 5.1 establishes the maxiset associated with any estimatorf p built with the (λ ǫ , p) −VBT method. According to (8) of Proposition 5.2 we deduce that the maxisets of these estimators are embedded and that the largest maxiset is the one associated withf ∞ (Hard Tree estimator).
Althoughf 2 was shown to be very powerful by using the Oracle approach (see Donoho [14]),f ∞ is better in the maxiset sense. This result is interpretable as the necessity to keep all empirical wavelet coefficients larger than λ ǫ in the reconstruction. Missing some of them has a huge maxiset-cost which corresponds to the exclusion of many functions estimated at the same rate. Moreover this suggests to include not only all the 'large' empirical wavelet coefficients but also some well chosen small ones. Autin [3] already underlies this important issue through what he calls cautious rules. In particular he proved thatf ∞ outperforms Hard and Soft thresholding estimators in the maxiset sense.

Numerical experiments
We first introduce the notations of the nonparametric model we are dealing with:

F. Autin et al.
We refer the reader to the classical literature (e.g., Tsybakov [25]) for details about the equivalence between this nonparametric regression model and the sequence model given by equation (2). We only recall that the noise level ǫ is such that ǫ = σ √ N . This section proposes numerical experiments designed to check whether the choice of the ℓ ∞ norm should be preferred as claimed by the corollary 5.1. The previous theory does not model all the complexity encountered in practice with the choice of the wavelet function, of the primary resolution scale, etc. Therefore, we choose a classical setting for numerical experiments, using Daubechies 8 Least Asymmetric. In addition, our theoretical model do not consider neither methoddependent threshold nor data-driven threshold. Hence, for these experiments, we naturally decide to use the universal threshold value for all methods, i.e., λ =σ 2N −1 log N . We follow a standard approach to estimate σ by the Median Absolute Deviation (MAD) divided by 0.6745 over the wavelet coefficients at the finest wavelet scale J − 1 (see e.g., Vidakovic [26]).
We generate the data sets from a large panel of functions often used in wavelet estimation studies (Antoniadis et al.  There are numerous connections between keep-or-kill estimation and hypothesis testing (see Abramovich et al. [1]). We will get an interesting insight into these methods by computing the number of false positives/negatives (i.e., type I/II errors). To do so, we compare the set of indices of wavelet coefficients kept by each estimators (T p ) and by the Tree-Oracle (T O ) with the one of the keep-orkill Oracle estimator S O = (j, k) ; j ∈ N, j < j λ σ √ N ; 0 ≤ k < 2 j ; |θ jk | > σ √ N . In addition, we give in Tables 1 and 2    Comparing the MISE off 2 withf ∞ we observe the optimality of the latter for most of the test functions with sometimes important improvements, up to 16% for the function 'doppler'. In the other cases, the loss off ∞ againstf 2 remains under 7%. More than that, for many of these functions we have a monotone decrease in the MISE as the value of p increases, reflecting the embeddings of the maxisets of the VBT ǫ estimators (see Section 5).
Looking at the number of false positives/negatives, we can check thatf ∞ allows to reduce the number of false negatives with a comparatively small increase in the number of false positives yielding its good performances in terms of MISE. Comparing the results to those of the Tree-Oracle we observe that there are potentially huge improvements achievable by reducing the number of false negatives. Indeed, the number of active coefficients of the Tree-Oracle estimators (see Section 2.2), T O is about 25% to 110% larger than |T ∞ |.

Conclusions
In this paper we introduced the family of the Vertical Block Thresholding estimators. We studied their performances under L 2 -risk using the maxiset approach, and we identified the ideal procedure, that is the one obtained from the (λ ǫ , ∞)-VBT-method. The main message of this paper is that the ideal estimator is different from the classical one obtained by plugging-in empirical quantities in the Tree-Oracle which corresponds to the estimator built from the (λ ǫ , 2)-VBT-method. Indeed, compared to the latter one, the ideal estimator is able to reconstruct more functions at the chosen rate.
It is important to emphasize that we compared both theoretically and numerically all these estimators for a fixed threshold value. We have chosen to use the universal threshold value for the numerical experiments although it is known to be too conservative in practice, simply in order to use the most standard choice for our comparisons.
Our theoretical and numerical results emphasize the importance of reducing the number of false negatives while maintaining the number of false positives. In addition, the numerical experiments which implement the Tree-Oracle estimator show us the important potential in reducing the amount of false negatives. To do so, using these methods, we should either consider more complex hereditary constraints or allow lower threshold values. Indeed, large threshold values lead to suboptimal estimation of the localized structure in the underlying curve. It would be more convenient to use a minimum risk threshold rather than the universal threshold (cf. Jansen [19]) but, when used with Hard thresholding, the estimate often shows unappealing visual artifacts (spurious bumps) due to large wavelet coefficients at fine resolution scales generated from the random noise ("false positives"). In this context, and as part of future research, we expect the vertical block thresholding algorithms also for p < ∞ to be powerful as they adaptively keep-or-kill blocks of coefficients even if they contain coefficients larger than the threshold value. Hence, the control of false positives is not only achieved by the threshold value but by the algorithm too. The conclusive words for the present results is that practical application would require to optimize simultaneously over the parameter p and over the threshold value.

Illustration of the Definition 3.1
In order to illustrate the definition 3.1 and the proposition 3.1 let us consider the example of a sequence |θ| of wavelet coefficients magnitudes given by the tree A in the Figure 8. We apply to this tree the (λ, p)-VBT-method for p = {2, ∞} with λ = 0.9 that yields the trees B and C.
In what follows we give the detailed steps of the iterative algorithm described in the definition 3.1 for the (λ = 0.9, p = 2)-VBT method: A: an example of a tree of wavelet coefficient magnitudes θ j,k ; B: the result of (λ = 0.9, p = 2)-VBT method applied to this example; C: the result of (λ = 0.9, p = ∞)-VBT method applied to this example.

Proof of Proposition 3.1
The proofs of 1. and 3. are obvious. To prove 2., we first notice that from Definition 3.1 the sets E jk (θ, λ) and F jk (θ, λ) depend on p. For notational convenience, we suppress the dependence on this parameter in the paper except for this proof as it is a crucial aspect to consider. Then we need the following Lemma 8.1.
Corollary 8.1. Let 1 ≤ p < q ≤ ∞, 0 < λ < 1 and let consider a sequence of real numbers θ := θ jk , 0 ≤ j < j λ , 0 ≤ k < 2 j . Then, for any couple of indices (j, k), the following property holds: Proof. Because of the (λ, •)-VBT-method, property (13) holds if and only if This statement is a consequence of Lemma 8.1. The proof of Proposition 3.1 is then deduced from the corollary above.

Proof of Proposition 5.1
Proof. According to (8) of Proposition 5.2, it suffices to state the embedding for the case p = 2. Let f ∈ B s 2,∞ . There exists C > 0 such that, for any j ∈ N, the wavelet coefficients of f satisfy: Fix 0 < λ < 1. Let j λ,s be the integer such that 2 −j λ,s ≤ λ 2 1+2s < 2 1−j λ,s . Notice that j λ,s ≤ j λ .

Proof of the maxiset results
In this section, we first provide technical lemmas which shall be used to prove the maxiset result established in Theorem 5. 2,∞ ∩ W r,p . Then: Proof. Let f ∈ B 2−r 4 2,∞ ∩ W r,p . Then its wavelet coefficients satisfy: For any n ∈ N, we denote by j λ,n the smallest integer such that For any n ∈ N, the number of wavelet coefficients under interest can be upper bounded by counting j + 1 ancestors for each leave at level j in the tree (a leave is a coefficient satisfying λ2 n < min (j ′ ,k ′ )∈P(j,k) θ / F j ′ k ′ (θ, λ) p ≤ λ2 1+n and |θ jk | > λ2 n ). So, For any n ∈ N, the leaves (j, k) with level j < j λ,n are the same as the ones got from the (λ2 n , p)-VBT-method satisfying min (j ′ ,k ′ )∈P(j,k) θ / F j ′ k ′ (θ, λ2 n ) p ≤ λ2 1+n . Moreover the number of such leaves is smaller than or equal to the number of wavelet coefficients θ jk with absolute value strictly larger than λ2 n and such that min Since f ∈ B 2−r 4 2,∞ , one has B < ∞. This ends the proof. Lemma 8.3. Let 0 < λ < 1, 1 ≤ p ≤ ∞, (j, k) be a couple of indices and θ be a sequence of wavelet coefficients. The two following properties are equivalent: ii) There exists a tree T rooted at (j, k) such that: Proof. We only prove the equivalence property for any 1 ≤ p < ∞ since the proof for the case p = ∞ is analogous.
Hence ii) is satisfied.
So i) is satisfied. This ends the proof.
(8) The embedding property is a direct consequence of 2. of Proposition 3.1. (9) The large inclusions are due to (8). To prove the strict embedding we construct a function which belongs to B u 2,∞ ∩ W 2 1+2s ,∞ but not to W 2 1+2s ,2 . The main idea to construct such a function is to ensure thatf ∞ uses all the coefficients up to the finest scale and thatf 2 thresholds the finest scale. To do so, we put non zero coefficients at each odd scales j and, within each scales, only one non zero coefficient over two. Hence, the ℓ 2 norm of the first block of coefficients, i.e., F j λ −2,k is lower than the threshold. In other wordsf 2 sets a whole scale of coefficients to zero whereasf ∞ keeps them. Now formally, let us consider the function h with wavelet coefficients (θ jk ) jk satisfying: θ jk = 2 − j 2 if j and k are odd and 0 ≤ k < 2(j + 1) 2 j 1+2s , θ jk = 0 otherwise.