OPTIMALLY SPARSE 3D APPROXIMATIONS USING SHEARLET REPRESENTATIONS

. This paper introduces a new Parseval frame, based on the 3–D shearlet representation, which is especially designed to capture geometric features such as discontinuous boundaries with very high eﬃciency. We show that this approach exhibits essentially optimal approximation properties for 3–D functions f which are smooth away from discontinuities along C 2 surfaces. In fact, the N term approximation f SN obtained by selecting the N largest coeﬃcients from the shearlet expansion of f satisﬁes the asymptotic estimate (cid:107) f − f SN (cid:107) 22 (cid:179) N − 1 (log N ) 2 , as N → ∞ . Up to the logarithmic factor, this is the optimal behavior for functions in this class and signiﬁcantly outperforms wavelet approximations, which only yields a N − 1 / 2 rate. Indeed, the wavelet approximation rate was the best published nonadaptive result so far and the result presented in this paper is the ﬁrst nonadaptive construction which is provably optimal (up to a loglike factor) for this class of 3D data. Our estimate is consistent with the corresponding 2–D (essentially) optimally sparse approximation results obtained by the authors using 2–D shearlets and by Cand`es and Donoho using curvelets.


Introduction
In a seminal paper published in 2004 [1], Candès and Donoho proved a remarkable result about sparse representations of 2-dimensional data, showing that the curvelet representation, a multiscale system of waveforms defined at various directions and positions at each scale, is essentially as good as an adaptive representation from the point of view of its ability to approximate images containing edges. Specifically, for functions f which are C 2 away from C 2 edges, the N term approximation f C N obtained from the N largest coefficients of its curvelet expansion, obeys Ignoring the loglike factor, this is the optimal approximation rate for this class of functions while, in comparison, the wavelet and Fourier representations only achieves approximation rate N −1 and N −1/2 , respectively. The work by Candès and Donoho was motivated by fundamental theoretical questions about the mathematical representations of functions containing edge discontinuities, and has great relevance for a variety of technologies and applications. In fact, the notion of sparsity has implications going far beyond approximation theory, since it entails the intimate understanding of the most essential information contained in data, which is critically important for the development of improved algorithms in areas such as data modeling, feature extraction, image denoising and classification [6]. Inspired in part by this work, the shearlet representation was introduced by the authors of this paper and their collaborators in [13,9] as an alternative approach which satisfies the same (essentially) optimally sparse approximation rate (1.1) when dealing with the same class of 2-D data. Similarly to the curvelet construction, the elements of the shearlet system form a multiscale pyramid of waveforms defined at various directions and positions and satisfying parabolic scaling. However, the shearlet approach relies on a different mathematical framework, based on the structure of affine systems, so that all elements of the representation system are derived from a single (or finite set of) generators through the action of the affine group. The unique properties of the shearlet approach provide not only the benefit of greater flexibility and mathematical simplicity, but also ensure that there is a natural transition from the continuum to the discrete setting [17,8]. This has been exploited in a wide range of very competitive applications such as those presented in [2,5,7,8,11,16,20]. Also notice that, very recently, a novel construction which uses compactly supported shearlet analyzing functions [15], was shown to provide optimally sparse 2D representations as in (1.1).
In this paper, we show that the shearlet approach extends naturally to the 3-dimensional setting where it also provides optimally sparse nonadaptive representations of 3D data. In fact, we construct a Parseval frame of shearlets to represent 3-dimensional functions f which are smooth away from discontinuities along C 2 boundaries, and prove that the N -term approximation f S N , obtained from the N largest coefficients of its shearlet representation, satisfies the estimate: Up the logarithmic factor, this is the optimal approximation rate for this type of functions [4] in the sense that no orthonormal bases or Parseval frames can yield approximation rates that are better than N −1 .
Even if one considers finite linear combinations of elements taking from arbitrary dictionaries, there is no depth-limited search dictionary that can achieve a rate better than N −1 [4]. In contrast, more traditional methods based on wavelet and Fourier approximations are significantly less efficient since their asymptotic approximation rate only decays as N −1/2 and N −1/3 , respectively [6,18]. Notice that the result presented in this paper is the first nonadaptive construction which is provably optimal (up to a loglike factor) for a large class of 3D data.  1.1. Significance. As in the 2-D case, a simple heuristic argument can be used to justify why a 3-D wavelet system cannot yield an approximation rate better than N −1/2 , in general, when dealing with 3-D data f containing discontinuous surfaces, while a 3-D shearlet system is expected to provide a much sparser representation. Indeed, at scale 2 −2j , a wavelet φ j,k (x) = 2 3j φ(2 2j x − k) is essentially supported on a box of size 2 −2j × 2 −2j × 2 −2j . Hence, there are approximately O(2 4j ) wavelet coefficients C j,k (f ) = f, φ j,k associated with the surface of discontinuity (while the remaining coefficients are negligible at fine scales). Since a direct computation shows that, at scale 2 −2j , all these wavelet coefficients are controlled by If follows that the N -th largest wavelet coefficient |C N (f )| is bounded by O(N −3/4 ) and, thus, if f W N is the approximation of f obtained by taking the N largest coefficients of its wavelet expansion, the L 2 -error obeys the estimate: By contrast, the elements of the shearlet system, denoted by ψ j, ,k , at scale 2 −2j , are essentially supported on a parallelepiped of size 2 −2j × 2 −j × 2 −j , with location controlled by k, and orientation controlled by . At fine scales (j "large"), it is reasonable to assume that the only significant shearlet coefficients S j, ,k (f ) = f, ψ j, ,k, are those corresponding to the shearlet elements which are tangent to the surface of discontinuity and there are about are O(2 2j ) coefficients of this type (see illustration in Figure 1). Again, a direct computation (see expression (1.3)) gives that so that, at scale 2 −2j , all these shearlet coefficients are controlled by and this implies that, if f N is the approximation of f computed by taking the N largest coefficients of its shearlets expansion, the L 2 -error approximately obeys the estimate: The rigorous proof of this estimate, which will be described in this paper, requires a more careful examination of the shearlet coefficients, including the contribution of those elements that have been considered negligible in the heuristic argument. As we will see, this will add an additional logarithmic factor to our estimate, finally yielding (1.2).
1.2. The shearlet representation. The shearlet systems considered in this paper will be derived within the framework of wavelets with composite dilations introduced by the authors and their collaborators in [13,14]. This approach provides a general method for the construction of representation systems made up of functions ranging not only at various scales and locations, as traditional wavelets, but also at various orientations, with the ability to deal very effectively with the type of anisotropic phenomena which are a main feature of the multidimensional data usually found in applications. Specifically, for ψ ∈ L 2 (R 3 ), a 3-D shearlet system is a collection of functions of the form . Similarly to the 2-D case, we are interested in shearlet systems whose elements are well localized and form a Parseval frame. To achieve this, for ξ = (ξ 1 , ξ 2 , ξ 3 ) ∈ R 3 , we define ψ bŷ where ψ 1 and ψ 2 satisfy the following assumptions: . Frequency support of a representative shearlet function ψ j, ,k , inside the pyramidal region D C . The orientation of the support region is controlled by = ( 1 , 2 ); its shape is becoming more elongated as j increases (j = 4 in this plot) .
It was shown in [10] that there are several examples of functions satisfying these properties. It follows from equation (1.5) that, for any j ≥ 0, Hence, using equations (1.4), (1.6) and the observation that a direct computation gives that: Similarly to the corresponding 2-D case [10], the shearlet elements ψ j, ,k are well-localized waveforms (in fact,ψ j, ,k ∈ C ∞ 0 ( R 2 )), with frequency support on a parallelepiped of approximate size 2 2j ×2 j ×2 j , at various scales depending on j ∈ Z, with orientations controlled by the two-dimensional index = ( 1 , 2 ) ∈ Z 2 and spatial location k ∈ Z 3 . Those supports becomes increasingly more elongated at finer scales (See Figure 2).
Notice that L 2 (D C ) ∨ is a strict subspace of L 2 . To obtain a Parseval frame for L 2 (R 3 ), one can construct a second Parseval frame of shearlets with frequency support in the pyramidal region ; similarly one obtains a third Parseval frame of shearlets with frequency support in the pyramidal region Finally, one can easily construct a Parseval frame (or an orthonormal basis) Then any function in L 2 (R 3 ) can be expressed as a sum f = P C f + P C 2 f + P C 3 f + P V 0 f , where each component corresponds to the orthogonal projection of f into one of the 4 subspaces of L 2 (R 3 ) described above. Since each one of the shearlet systems defined on a pyramidal region behaves very similarly, in the following it will be sufficient to examine the sparsity properties for the "horizontal" system (1.7).

Main Results
Before stating the main theorems, let us define more precisely the class of functions that will be used to model the data we are interested in. We follow [3] and introduce ST AR 2 (A), a class of indicator functions of sets B with C 2 boundaries ∂B. In polar coordinates, let ρ(θ, φ) : [0, 2π) × [0, π) → [0, 1] 2 be a radius function and define B by x ∈ B if and only if |x| ≤ ρ(θ, φ). In particular, the boundary ∂B can be parametrized as the surface in R 3 : The class of boundaries of interest to us are defined by We say that a set B ∈ ST AR 2 (A) if B ⊂ [0, 1] 3 and B is a translate of a set obeying (2.8) and (2.9). In addition, we set C 2 0 ([0, 1] 3 ) to be the collection of twice differentiable functions supported inside [0, 1] 3 . Finally, we define the set E 2 (A) of functions which are C 2 away from a C 2 surface as the collection of functions of the form We can now state the following main result.
where |s(f )| (N ) denotes the N -th largest entry in this sequence {s µ (f )}.
Using Theorem 2.1, we are just one step away from our main result about shearlet approximations. Indeed, let f S N be the N -term approximation of f obtained from the N largest coefficients of its shearlet expansion, namely where I N ⊂ M is the set of indices corresponding to the N largest entries of the sequence {| f, ψ µ | 2 : µ ∈ M }. Since the approximation error satisfies the estimate from (2.10) we immediately have:

Arguments and constructions.
For reasons of brevity, it will not be possible to present a complete proof of Theorem 2.1 in this short communication. A complete and detailed proof will appear in a separate work [12]. In the following, we will sketch the main ideas of the proof and emphasize the most significant differences with respect to the 2-D argument. As we will show, the general structure of the proof follows the overall structure of the corresponding 2-dimensional sparsity result in [10]. However, the main technical arguments needed to estimate the effect of the surface discontinuities in the shearlet representation require the introduction of a fundamentally new approach which is significantly different from the 2D case.
As in [10], it will be convenient to introduce the weak-p quasi-norm · w p to measure the sparsity of the shearlet coefficients { f, ψ µ : µ ∈ M }. This is defined as where |s µ | (N ) is the N -th largest entry in the sequence {s µ } and it is equivalent (cf. [19,Sec.5.3]) to the expression: To analyze the decay properties of the shearlet coefficients { f, ψ µ } µ∈M at a given scale 2 −j , j ≥ 0, we will smoothly localize the function f near dyadic cubes. Namely, for a scale parameter j ≥ 0 fixed, let M j = {(j, , k) : −2 j ≤ 1 , 2 ≤ 2 j , k ∈ Z 3 } and Q j be the collection of dyadic cubes of the form For w a nonnegative C ∞ function with support in [−1, 1] 3 , we define a smooth partition of unity . We will then examine the shearlet coefficients of the localized function It turns out that for f ∈ E 2 (A), the coefficients { f Q , ψ µ : µ ∈ M j } exhibit a very different decay behavior depending on whether the surface intersects the support of w Q or not. Let Q j = Q 0 j ∪ Q 1 j , where the union is disjoint and Q 0 j is the collection of those dyadic cubes Q ∈ Q j such that the surface intersects the support of w Q . Since each Q has sidelength 2 · 2 −j , then Q 0 j has cardinality |Q 0 j | ≤ C 0 2 2j , where C 0 is independent of j. Similarly, since f is compactly supported in [0, 1] 3 , |Q 1 j | ≤ 2 3j + 6 · 2 2j . Using this notation, we can now state the basic results that are needed to prove Theorem 2.1.
for some constant C independent of Q and j.
for some constant C independent of Q and j.
Before discussing those proofs, let us show how Theorems 2.3 and 2.4 are used to prove Theorem 2.1.
The proofs of Theorems 2.3 and 2.4 are rather involved. Theorems 2.4, in particular, follows essentially the same ideas as the 2-D case; this is not surprising since it deals with the situation where the shearlets are away from the discontinuity. In contrast, Theorems 2.3, which deals with shearlet coefficients associated with the discontinuous surface, requires the introduction of new analytical tools which are very different from the 2-D case. In the following section, we will sketch the main ideas of the proof of Theorems 2.3.

2.2.
Proof of Theorem 2.3 (sketch). Let us consider a function f ∈ E 2 (A) which contains a C 2 surface of discontinuity. For j > j 0 sufficiently large, over a cube of side 2 −j , the surface of discontinuity can be parametrized as x 2 ). For simplicity, without loss of generality, we will assume that this surface, denoted by Σ, satisfies the equation and that, for all m, n ∈ N such that m + n = 2, we have: This ensures that the surface is locally nearly flat near the origin. In the following, we will only discuss the situation j > j 0 . The situation when j ≤ j 0 is much simpler and will not be discussed here.
The key step in the following argument is the estimation of the decay of the shearlet coefficient corresponding to the surface of discontinuity, and this requires a more elaborated argument than the 2D case. For this analysis, it is useful to consider the following localized version of f . Let w be a nonnegative C ∞ window function with support in [−1, 1] 3 , and define a surface fragment as a function of the form: where g ∈ C 2 0 ((−1, 1) 3 ). We have the following fundamental result whose proof requires several delicate steps (involving the computation of the Ray Transform of f ) but will omitted for space constraint. Without loss of generality we may assume that |l 1 | ≤ |l 2 | due to the fact that the variables ξ 2 and ξ 3 are symmetric in the construction of the horizontal shearlets.