Anisotropic Denoising in Functional Deconvolution Model with Dimension-free Convergence Rates

In the present paper we consider the problem of estimating a periodic $(r+1)$-dimensional function $f$ based on observations from its noisy convolution. We construct a wavelet estimator of $f$, derive minimax lower bounds for the $L^2$-risk when $f$ belongs to a Besov ball of mixed smoothness and demonstrate that the wavelet estimator is adaptive and asymptotically near-optimal within a logarithmic factor, in a wide range of Besov balls. We prove in particular that choosing this type of mixed smoothness leads to rates of convergence which are free of the"curse of dimensionality"and, hence, are higher than usual convergence rates when $r$ is large. The problem studied in the paper is motivated by seismic inversion which can be reduced to solution of noisy two-dimensional convolution equations that allow to draw inference on underground layer structures along the chosen profiles. The common practice in seismology is to recover layer structures separately for each profile and then to combine the derived estimates into a two-dimensional function. By studying the two-dimensional version of the model, we demonstrate that this strategy usually leads to estimators which are less accurate than the ones obtained as two-dimensional functional deconvolutions. Indeed, we show that unless the function $f$ is very smooth in the direction of the profiles, very spatially inhomogeneous along the other direction and the number of profiles is very limited, the functional deconvolution solution has a much better precision compared to a combination of $M$ solutions of separate convolution equations. A limited simulation study in the case of $r=1$ confirms theoretical claims of the paper.

convolution y(u, t) = 1 0 g(u, t − x)f (u, x)dx + εz(u, t), u ∈ [0, 1] r , t ∈ [0, 1]. (1.1) Here, ε is a positive small parameter such that asymptotically ε → 0, Function g(., .) in (1.1) is assumed to be known and z(u, t) is an r + 1-dimensional Gaussian white noise, i.e., a generalized r + 1-dimensional Gaussian field with covariance function where δ(·) denotes the Dirac δ-function and u il = (u i1 , · · · , u ir ) ∈ [0, 1] r , i = 1, 2. with y l (t i ) = y(u l , t i ), f l (x) = f (u l , x) and g l (t i − x) = g(u l , t i − x). This is, however, not true since the solution of equation (1.4) is a two-dimensional function while solutions of equations (1.5) are M unrelated functions f i (t). In this sense, problem (1.3) and its sampling equivalent (1.4) are functional deconvolution problems. Functional deconvolution problems have been introduced in Pensky and Sapatinas (2009) and further developed in Sapatinas (2010, 2011). However, Pensky and Sapatinas (2009, 2010, 2011) considered a different version of the problem where f (u, t) was a function of one variable, i.e. f (u, t) ≡ f (t). Their interpretation of functional deconvolution problem was motivated by solution of inverse problems in mathematical physics and multichannel deconvolution in engineering practices. Functional deconvolution problem of types (1.3) and (1.4) are motivated by experiments where one needs to recover a two-dimensional function using observations of its convolutions along profiles u = u i . This situation occurs, for example, in geophysical explorations, in particular, the ones which rely on inversions of seismic signals (see, e.g., monographs of Robinson et al. (1996) and Robinson (1999) and, e.g., papers of Wason et al. (1984), Berkhout (1986)and Heimer and Cohen (2008)).
In seismic exploration, a short duration seismic pulse is transmitted from the surface, reflected from boundaries between underground layers, and received by an array of sensors on the Earth surface. The signals are transmitted along straight lines called profiles. The received signals, called seismic traces, are analyzed to extract information about the underground structure of the layers along the profile. Subsequently, these traces can be modeled under simplifying assumptions as noisy outcomes of convolutions between reflectivity sequences which describe configuration of the layers and the short wave like function (called wavelet in geophysics) which corresponds to convolution kernel. The objective of seismic deconvolution is to estimate the reflectivity sequences from the measured traces. In the simple case of one layer and a single profile, the boundary will be described by an univariate function which is the solution of the convolution equation. The next step is usually to combine the recovered functions which are defined on the set of parallel planes passing through the profiles into a multivariate function which provides the exhaustive picture of the structure of the underground layers. This is usually accomplished by interpolation techniques. However, since the layers are intrinsically anisotropic (may have different structures in various directions) and spatially inhomogeneous (may experience, for example, sharp breaks), the former approach ignores the anisotropic and spatially inhomogeneous nature of the two-dimensional function describing the layer and loses precision by analyzing each profile separately.
The paper carries out the following program: i) Construction of a feasible procedure f (u, t) for estimating the (r + 1)-dimensional function f (u, t) which achieves optimal rates of convergence (up to inessential logarithmic terms). We require f (u, t) to be adaptive with respect to smoothness constraints on f . In this sense, the paper is related to a multitude of papers which offered wavelet solutions to deconvolution problems (see, e.g., Donoho (1995) ii) Identification of the best achievable accuracy under smoothness constraints on f . We focus here on obtaining fast rates of convergence. In this context, we prove that considering multivariate functions with 'mixed' smoothness and hyperbolic wavelet bases allows to obtain rates which are free of dimension and, as a consequence, faster than the usual ones. In particular, the present paper is related to anisotropic de-noising explored by, e.g., Picard (2001, 2008). We compare our functional classes as well as our rates with the results obtained there.
iii) Comparison of the two-dimensional version of the functional deconvolution procedure studied in the present paper to the separate solutions of convolution equations. We show especially that the former approach delivers estimators with higher precision. For this purpose, in Section 5, we consider a discrete version of functional deconvolution problem (1.4) (rather than the continuous equation (1.3)) and compare its solution with solutions of M separate convolution equations (1.5). We show that, unless the function f is very smooth in the direction of the profiles, very spatially inhomogeneous along the other direction and the number of profiles is very limited, functional deconvolution solution has a better precision than the combination of M solutions of separate convolution equations.
The rest of the paper is organized as follows. In order to make the paper more readable and due to the application to seismic inversion, we start, in Section 2, with the two-dimensional version of the functional deconvolution problem (1.3), describe the construction of a two-dimensional wavelet estimator of f (u, t) given by equation (1.3). In Section 3, we give a brief introduction on spaces of anisotropic smoothness. After that, we derive minimax lower bounds for the L 2 -risk, based on observations from (1.3), under the condition that f (u, t) belongs to a Besov ball of mixed regularity and g(u, x) has certain smoothness properties. In Section 4, we prove that the hyperbolic wavelet estimator derived in Section 2 is adaptive and asymptotically near-optimal within a logarithmic factor (in the minimax sense) in a wide range of Besov balls. Section 5 is devoted to the discrete version of the problem (1.4) and comparison of functional deconvolution solution with the collection of individual deconvolution equations. Section 6 extends the results to the (r + 1)-dimensional version of the problem (1.1). Section 7 contains a limited simulation study which supports theoretical claims of the paper. We conclude the paper by discussion of the results in Section 8. Finally, Section 9 contains the proofs of the theoretical results obtained in the earlier sections.
Consider a bounded bandwidth periodized wavelet basis (e.g., Meyer-type) ψ j,k (t) and finitely supported periodized s 0 -regular wavelet basis (e.g., Daubechies) η j ′ ,k ′ (u). The choice of the Meyer wavelet basis for t is motivated by the fact that it allows easy evaluation of the the wavelet coefficients in the Fourier domain while finitely supported wavelet basis gives more flexibility in recovering a function which is spatially inhomogeneous in u. Let m 0 and m ′ 0 be the lowest resolution levels for the two bases and denote the scaling functions for the bounded bandwidth wavelet by ψ m 0 −1,k (t) and the scaling functions for the finitely supported wavelet by η m ′ 0 −1,k ′ (u). Then, f (u, x) can be expanded into wavelet series as Denote β j,k (u) = f (u, ·), ψ j,k (·) , then, β j,k,j ′ ,k ′ = β j,k (·), η j ′ ,k ′ (·) . If ψ j,k,m = e m , ψ j,k are Fourier coefficients of ψ j,k , then, by formula (2.1) and Plancherel's formula, one has where, for any j ≥ j 0 , (2.4) due to the fact that Meyer wavelets are band-limited (see, e.g., Johnstone, Kerkyacharian, Picard & Raimondo (2004), Section 3.1). Therefore, β j,k,j ′ ,k ′ are of the form and allow the unbiased estimator We now construct a hard thresholding estimator of f (u, t) as and the values of J, J ′ and λ jε will be defined later.
In what follows, we use the symbol C for a generic positive constant, independent of ε, which may take different values at different places.

Smoothness classes
It is natural to consider anisotropic multivariate functions, i.e., functions whose smoothness is different in different directions. It is, however, much more difficult to construct appropriate spaces of mixed regularity which are meaningful for applications. One of the objectives of the present paper is to prove that classes of mixed regularity allow to obtain rates of convergence which are free of dimension. This is specifically due to the application of hyperbolic wavelets, i.e., wavelets which allow different resolution levels for each direction (see, e.g., Heping (2004)).
Although comprehensive study of functional classes of mixed regularity is not the purpose of this paper, below we provide a short introduction of functional classes that we are going to consider. Due to relation of this paper to anisotropic de-noising explored by Picard (2001, 2008), we also compare classes of mixed regularity used therein to the Nikolski classes considered in the papers cited above.
Let f be a measurable function defined on R d . For any x, y ∈ R d , we define If l ∈ N then ∆ l y is the l−iterated version of the operator ∆ y . (Of course ∆ 0 y = I d where I d is the identity operator.) Then, Nikolski classes can be defined as follows: with the usual modification for p = ∞.) 1. Let e 1 , ....e d be the canonical basis of R d . For 0 < s i < ∞; 1 ≤ p i ≤ ∞, we say that f belongs to N s i p i ,∞ if and only if there exists l ∈ N, s i < l, and C(s i , l) < ∞, such that for any h ∈ R one has The Nikolski classes defined above were investigated by Picard (2001, 2008), they are anisotropic but do not involve mixed smoothness. Quite differently, in the present paper we shall consider classes of mixed regularity defined as follows. Denote h = (h 1 , . . . , h d ), t = (t 1 , . . . , t d ), s = (s 1 , . . . , s d ) and let t i > 0, s i > 0, i = 1, · · · , d. For a subset e ⊂ {1, . . . , d}, we set h e to be the vector with coordinates h i when i belongs to e, and 0 otherwise. For a fixed integer l and 1 ≤ p ≤ ∞, we denote Now, in order to construct Besov classes of mixed regularity, we choose l ≥ max j s j and define It is proved in, e.g., Heping (2004)  that correspond to q < ∞ rather than q = ∞. In this paper, we shall assume that the hyperbolic wavelet basis satisfies required regularity conditions and follow Heping (2004) definition of Besov spaces of mixed regularity (3.2) Besov classes (3.2) compare quite easily to the Nikolski classes: it is easy to prove that the former form a subset of the latter.

Lower bounds for the risk:two-dimensional case
In what follows, we assume that the function f (u, t) belongs to a two-dimensional Besov ball as described above (d = 2), so that wavelet coefficients β j,k,j ′ k ′ satisfy the following condition Below, we construct minimax lower bounds for the L 2 -risk. For this purpose, we define the minimax L 2 -risk over the set V as where g is the L 2 -norm of a function g(·) and the infimum is taken over all possible estimatorsf (·) (measurable functions taking their values in a set containing V ) of f (·). Assume that functional Fourier coefficients g m (u) of function g(u, t) are uniformly bounded from above and below, that is, there exist positive constants ν, and C 1 and C 2 , independent of m and u such that Then, the following theorem gives the minimax lower bounds for the L 2 -risk of any estimatorf n of f .
Note that the value of d in (3.7) can be re-written as Remark 1 Note that the rates obtained here are in fact the worst rate associated to the one dimensional problem in each direction, which is not surprising since a function of only one variable and constant in the other direction, e.g., f (u 1 , u 2 ) = h(u 1 ) belongs to B s 1 ,s 2 p,q (A) as soon as h belongs to a ball of the usual one-dimensional Besov space B s 1 p,q , for any s 2 .
Also it is worthwhile to observe that the third rate (involving s ′ 1 ) corresponds in dimension one to a "sparse" rate. Hence we observe here the so-called "elbow phenomenon" occurring only along the direction 2, because we are considering an L 2 -loss and the problem has a degree of ill-posedness ν precisely in this direction.

Minimax upper bounds.
Before deriving expressions for the minimax upper bounds for the risk, we formulate several useful lemmas which give some insight into the choice of the thresholds λ jε and upper limits J and J ′ in the sums in (2.7). (2.6). Then, under assumption (3.5), one has Lemma 1 suggests that thresholds λ jε should be chosen as where C β is some positive constant independent of ε. We choose J and J ′ as Note that the choices of J, J ′ and λ jε are independent of the parameters, s 1 , s 2 , p, q and A of the Besov ball B s 1 s 2 p,q (A), and therefore our estimator (2.7) is adaptive with respect to those parameters.
The next two lemmas provide upper bounds for the wavelet coefficients and the large deviation inequalities for their estimators.
Theorem 2 Let f (., .) be the wavelet estimator defined in (2.7), with J and J ′ given by (4.3). Let condition (3.5) hold and min{s 1 , where C 1 is defined in (3.5), then, as ε → 0, where d is defined in (3.7) and Remark 2 Looking at the previous results, we conclude that the rates obtained by the wavelet estimator defined in (2.7) are optimal, in the minimax sense, up to logarithmic factors. These factors are standard and coming from the thresholding procedure.

Sampling version of the equation and comparison with separate deconvolution recoveries
Consider now the sampling version (1.4) of the problem (1.3). In this case, the estimators of wavelet coefficients β j,k,j ′ ,k ′ can be constructed as In practice, β j,k,j ′ ,k ′ are obtained simply by applying discrete wavelet transform to vectors y m (·)/g m (·). For any two sequences a n and b n , one says that a n ≍ b n as n → ∞ if 0 < C 1 < a n /b n < C 2 < ∞ for some constants C 1 and C 2 independent of n. Recall that the continuous versions (2.6) of estimators (5.1) have Var β j,k,j ′ ,k ′ ≍ ε 2 2 2jν (see formula (4.1)). In order to show that equation (1.4) is the sampling version of (1.3) with ε 2 = σ 2 /(M N ), one needs to show that, in the discrete case, Var β j,k,j ′ ,k ′ ≍ σ 2 (M N ) −1 2 2jν . This indeed is accomplished by the following Lemma.
Using tools developed in Pensky and Sapatinas (2009) and Lemma 4, it is easy to formulate the lower and the upper bounds for convergence rates of the estimator (2.7) with β j,k,j ′ k ′ given by (2.8) and the values of λ jε and J, J ′ defined in (4.2) and (4.3), respectively. In particular, we obtain the following statement.
Now, let us compare the rates in Theorem 3 with the rates obtained by recovering each deconvolution f l (t) = f (u l , t), u l = l/M , l = 1, · · · , M , separately, using equations (1.5). In order to do this, we need to determine in which space functions f l (x) are contained. The following lemma provides the necessary conclusion.
Assume, as before, that functional Fourier coefficients g m (u) of function g(u, t) are uniformly bounded from above and below and that function f (u, t) belongs to an (r + 1)-dimensional Besov ball. As described in section 3.1 to define these Besov balls, we introduce the vector s 2 = (s 21 , · · · , s 2r ) and denote by s ′ 2 and s * 2 vectors with components s ′ 2l = s 2l +1/2−1/p ′ and s * 2l = s 2l +1/2−1/p, l = 1, · · · , r, respectively, where p ′ = min{p, 2}. If s 0 ≥ max l s 2l , then the (r + 1)dimensional Besov ball of radius A is characterized by its wavelet coefficients β j,k,j ′ ,k ′ as follows (see, e.g. Heping (2004)) (6.7) It is easy to show that, with the above assumptions, similarly to the two-dimensional case, as ε → 0, one has The upper and the lower bounds for the risk are expressed via s 2,0 = min l=1,··· ,r s 2,l = s 2,l 0 , (6.10) where l 0 = arg min s 2,l . In particular, the following statements hold.

Simulations.
In order to investigate finite-sample performance of our estimator, we carried out a limited simulation study. We used WaveLab package for Matlab and carried out simulations using degree 3 Meyer wavelet and degree 6 Daubechies wavelets. We generated data using equation (1.4) with kernel q(u, t) = 0.5 exp(−|t| (1 + (u − 0.5) 2 )), various functions f (u, t) and various values of M , N and σ. In particular, we used N = 512 , M = 128 or M = 256, σ = 0.5 or σ = 1.0 and f (u, t) = f 1 (u)f 2 (t) where f 1 (u) and f 2 (t) are standard test functions routinely used in testing signal processing techniques (see, e.g., introduced by Donoho & Johnstone (1994)). In particularly, we utilize functions blip, bumps, and quadratic with quadratic just being a quadratic function (y − 0.5) 2 scaled to have a unit norm. Note that, though f (u, t) is a product of two dimensional functions, the method does not "know" this and, therefore, cannot take advantage of this information.
Graphs of all test functions are presented in Figure 1.   Table 1 contains simulations results. We generated data and constructed functional deconvolution estimator (2.7) and also M Fourier-wavelet deconvolution estimators of Johnstone, Kerkyacharian, Picard and Raimondo (2004)). We evaluated mean integrated square error (MISE) E f − f 2 of the functional deconvolution estimator and the average MISE of M Fourier-wavelet deconvolution estimators. Table 1 reports the averages of those errors over 100 simulation runs together with their standard deviations (in the parentheses).
Simulation results confirm that, as M grows, functional deconvolution becomes more advantageous than M separate deconvolutions. Indeed, while the error of a functional deconvolution estimator declines as M grows, the average error of M deconvolution estimators remains the same.  8 Discussion.
i) In the present paper, we constructed functional deconvolution estimators based on the hyperbolic wavelet thresholding procedure. We derived the lower and the upper bounds for the minimax convergence rates which confirm that estimators derived in the paper are adaptive and asymptotically near-optimal, within a logarithmic factor, in a wide range of Besov balls of mixed regularity.
ii) Although results of Picard (2001, 2008) have been obtained in a slightly different framework (no convolution), they can nevertheless be compared with the results presented above. Set ν = 0 to account for the absence of convolution, p i = p and d = r + 1. Then, convergence rates in the latter can be identified as rates of a one-dimensional setting with a regularity parameter which is equal to the harmonic mean In our case, the rates can also be identified as the rates in the one-dimensional setting with a regularity parameter min i s i which is always larger thans. Moreover, if s i = s, one obtainss = sd > s = min s i , showing that estimators of Picard (2001, 2008) in the Nikolski spaces are affected by "the curse of dimensionality" while the estimators in the anisotropic Besov spaces of mixed regularity considered in this paper are free of dimension and, therefore, have higher convergence rates.
iii) The problem studied in the paper is related to seismic inversion which can be reduced to solution of noisy convolution equations which deliver underground layer structures along the chosen profiles. The common practice in seismology, however, is to recover layer structures separately for each profile and then to combine them together. Usually, it is, however, not the best strategy and leads to estimators which are inferior to the ones obtained as two-dimensional functional deconvolutions. Indeed, as it is shown above, unless function f is very smooth in the direction of the profiles, very spatially inhomogeneous along another dimension and the number of profiles is very limited, functional deconvolution solution has precision superior to combination of M solutions of separate convolution equations. The precise condition when separate recoveries are preferable to the two-dimensional one is given by formula (5.6) which, essentially, is very reasonable. Really, if the number M of profiles is small, there is no reason to treat f as a two-dimensional function. Small value of s 2 indicates that f is very spatially inhomogeneous and, therefore, the links between its values on different profiles are very weak. Finally, if s 1 is large, deconvolutions are quite precise, so that combination of various profiles cannot improve the precision.

Acknowledgments
Marianna Pensky and Rida Benhaddou were partially supported by National Science Foundation (NSF), grant DMS-1106564. 9 Proofs. 9.1 Proof of the lower bounds for the risk.
In order to prove Theorem 1, we consider two cases, the case when f (u, t) is dense in both variables (the dense-dense case) and the case when f (u, t) is dense in u and sparse in t (the sparse-dense case). The proof is based on Lemma A.1 of Bunea, Tsybakov and Wegkamp (2007) which we reformulate here for the case of squared risk.

Proofs of supplementary lemmas.
Proof of Lemma 1. Let us derive an expression for the upper bound of the variance of (2.6). Subtracting (2.5) from (2.6) we obtain Now, before we proceed to the derivation of the upper bound of the variance, let us first state a result that will be used in our calculation. Recall from stochastic calculus that for any function F (t, u) ∈ L 2 ([0, 1] × [0, 1]), one has Hence, recalling that z m (u) = z(u, t)e m (t)dt, choosing squaring both sides of (9.12), taking expectation and using the relation (9.13), we obtain since in the double summation above, all terms involving m = m ′ vanish due to 1 0 e m (t)e m ′ (t)dt = 0. Consequently, Taking into account (2.4), (3.5) and the fact that |ψ j,k,m | ≤ 2 −j/2 , obtain so that (4.1) holds.

Proof of upper bounds for the risk.
Proof of Theorem 2 Denote χ ε,A = A −2 ε 2 ln(1/ε), (9.16) 9.17) and observe that with J and J ′ given by (4.3), the estimation error can be decomposed into the sum of four components as follows where For R 1 , using (4.1), derive, as ε → 0, To calculate R 4 , we apply Lemma 2 and use (4.3) obtaining, as ε → 0, Then, our objective is to prove that, as ε → 0, one has R i = O A 2 χ d ε,A [ln(1/ε)] d 1 . Now, note that each R 2 and R 3 can be partitioned into the sum of two errors as follows where Combining (9.22) and (9.24), and applying Cauchy-Schwarz inequality and Lemma 3 with α = 1/2, one derives Hence, due to condition (4.6), one has, as ε → 0, For the sum of R 22 and R 32 , using (4.1) and (4.2), we obtain Then, ∆ can be partitioned into the sum of three components ∆ 1 , ∆ 2 and ∆ 3 according to three different sets of indices: where d is defined in (3.7). It is easy to see that for ∆ 1 given in (9.28) and j 0 and j ′ 0 given by (9.17), as ε → 0, one has For ∆ 2 defined in (9.29), obtain In order to construct upper bounds for ∆ 3 in (9.30), we need to consider three different cases.
Proof of Theorem 4. Repeating the proof of Theorem 1 with j ′ and k ′ replaced by j ′ and k ′ , respectively, and s 2 j ′ replaced by j ′ T s ′ 2 , we again arrive at two cases. Denote the r-dimensional vector with all unit components by e.
Proof of Theorem 5. Repeat the proof of Theorem 2 with j ′ and k ′ replaced by j ′ and k ′ , respectively, s 2 j ′ replaced by j ′ T s ′ 2 and 2,l , l = 1, · · · , r.