Minimax optimal procedures for testing the structure of multidimensional functions

We present a novel method for detecting some structural characteristics of multidimensional functions. We consider the multidimensional Gaussian white noise model with an anisotropic estimand. Using the relation between the Sobol decomposition and the geometry of multidimensional wavelet basis we can build test statistics for any of the Sobol functional components. We assess the asymptotical minimax optimality of these test statistics and show that they are optimal in presence of anisotropy with respect to the newly determined minimax rates of separation. An appropriate combination of these test statistics allows to test some general structural characteristics such as the atomic dimension or the presence of some variables. Numerical experiments show the potential of our method for studying spatio-temporal processes.


Introduction
Multidimensional data often exhibit a simpler underlying structure, meaning that their effective dimension is smaller than the observed dimension. Detecting the presence of such structure can enhance the understanding of the data and allows for more effective modeling and inferential strategies. There is a flourishing literature dealing with nonparametric methods for structure detection. These contributions are concerned with different types of structures (such as additivity, small atomic dimension and variable selection) and with different modeling approaches according to the nature of the noise, the smoothness assumptions, etc. A brief overview of the most directly related contributions are given hereafter. The main characteristic of this paper is to This version March 22, 2017 2 provide a rigorous theoretical study of structure detection in the presence of an anisotropically smooth estimand. We consider a multidimensional Gaussian white noise model and derive test statistics for the atomic dimension of multidimensional objects (i.e., the maximal degree of interaction between the variables) and for variable selection. Then, following the Sieve estimation of Birgé and Massart (1998), we build a data-driven procedure. It is first tested in 'idealistic' numerical experiments before being applied to a more sophisticated context of time series data analysis.
The main ingredient of our methodology is to project the data onto an hyperbolic (wavelet) basis and to build test statistics based on the projection coefficients. Its motivation relies on the relation that emerges between the geometry of that basis and the 'functional components' of the Sobol decomposition of the estimand (Sobol, 1969). The use of an appropriate basis allows to benefit from a sparse representation and an optimal adaptation to the anisotropic smoothness of the estimand. In this paper we construct optimal testing procedures for the functional components accordingly to the minimax hypothesis testing framework (Ingster, 1993a,b,c). Appropriately combined these functional components can be rephrased in terms of more general structures such as the atomic dimension. This project is motivated by the recent results of Autin et al. (2014Autin et al. ( , 2015 who studied the ability of a tensor-product wavelet basis, the so-called hyperbolic wavelet basis to estimate multidimensional functions having anisotropic smoothness. Tensor-product bases (Fourier or wavelets) have already been widely used in signal detection notably to test for additivity (i.e. atomic dimension equals to 1) as in De Canditiis and Sapatinas (2004). Nevertheless, in the latter they do not provide deep theoretical results on the performance of their method. Abramovich et al. (2008) describe another procedure for testing additivity. They propose an adaptive procedure based on the thresholding wavelet coefficients and derive interesting theoretical results. Nevertheless, they exploit a standard (isotropic) wavelet basis that cannot optimally deal with the more realistic situation of having anisotropic estimands. Moreover, their method is limited to test for additivity and does not exploit the full directional representation of a multidimensional wavelet basis. The functional framework (where the dimension d → ∞) has also been investigated by Gayraud and Ingster (2012) who studied the problem of signal detection in the case of sparse additive functions. Other developments include, for example, multichannel signal detection (Ingster and Suslina, 2005), and detection in inverse models . Comminges and Dalalyan (2013) made profound theoretical contributions on hypothesis testing procedures based on quadratic functionals. They study various testing problems involving multidimensional anisotropic functions. They exploit a tensor-product Fourier basis to build non adaptive procedures, i.e. procedures that are built on the knowledge of the regularity parameters of the anisotropic function spaces in the alternative. In this paper, we focus on the consequences of dealing with anisotropic estimands. Therefore, we define two classes of testing procedures, referred to as minimax and adaptive minimax optimal methods for the Besov balls, see Sec-This version March 22, 2017 3 tion 4. The former methodology is built using the full knowledge of the regularity parameters while the latter uses only a part of it.
These methods differ by the nature of the truncation applied to the hyperbolic wavelet coefficients sequence. An intuitive way to think about this is to consider the better known multidimensional function estimation context. The truncation rule characterizing the minimax optimal method can be viewed as a linear but directionally dependent smoothing. While the truncation rule of the adaptive minimax optimal method finds its roots in the approximation of the hyperbolic cross that is naturally associated to tensor-product spaces (Schmeisser and Sickel, 2004;Schmeisser, 2006;Sickel and Ullrich, 2009). Tensor-product spaces are also often considered in various statistical contexts such as in estimation with for example the tensor-product space ANOVA model (Lin, 2000) or in signal detection (Ingster and Stepanova, 2009). They are of great interest since one can hope to reach performances close to the unidimensional case. It is sometimes promoted as a way to 'tackle' the curse of dimensionality, nevertheless it is a restrictive assumption. In the context of functional ANOVA, for example, assuming a tensorproduct model means that the interaction term's complexity has to be inversely proportional to its degree. In this paper we do not restrict to the structure detection of functions belonging to tensor-product of Besov spaces but to larger classes such as the anisotropic Nikolskii class. We show that our methods attain the optimal separation rates between the null and the alternative hypothesis and we discuss how to combine sets of testing procedures in a way that is adapted to the anisotropic case.
Such testing procedures find applications in many fields. We illustrate its usage in time series data analysis. Our aim is to test the structure of the spatio-spectrum associated to a spatiotemporal process that is time stationary. This allows us also to illustrate how to adapt such testing procedure in the context of multidimensional data where the number of observations is very large, making p-values useless. This paper is structured as follows. First, in Section 2, we give the fundamental knowledge on hyperbolic wavelet bases and their relation to the Sobol decomposition of a multidimensional function. In Section 3 we introduce the multidimensional Gaussian white noise model and the minimax hypothesis testing framework. Then we describe our test statistics for Sobol functional components in Section 4 before introducing tests for global structural characteristics in Section 5. Section 6 describes our data-driven adaptation and validation in a practical setting of the adaptive minimax optimal procedure and a final discussion is provided in Section 7. The proofs of the theoretical results are postponed to the appendix.

Hyperbolic wavelet bases
We start from a one-dimensional periodized wavelet basis B 1 of L 2 ([0, 1)) which is built from the dilations and translations of a scaling function φ and a wavelet ψ with V (for some V ≥ 1) vanishing moments, A d-dimensional hyperbolic wavelet basis results by forming d-variate functions taking appropriate products of the univariate functions φ and ψ as follows, . . , i d ) with the following notations: The basis B d is generated by a set of a scaling function φ 0,0 and translated and dilated wavelet functions ψ i j,k . If all the components of the vector of directional dilations j are equal, this results in the standard (isotropic) wavelet basis. Hereafter we consider that the components of j can take different values. The wavelet functions are then supported on hyper-rectangles and the resulting wavelet basis, the so-called hyperbolic or tensor-product wavelet basis, is proven to be able to optimally deal with anisotropic functions (Autin et al., 2014(Autin et al., , 2015. We say that each of the 2 d elements of i defines an orientation. In such a basis, any f ∈ L 2 ([0, 1) d ) can be decomposed as follows: where ., . 2 is the L 2 -scalar product. This representation facilitates a characterization of a specific structure, such as additivity. Indeed, for an additive function f add , that is a function with Sobol decomposition f add (x 1 , . . . , j,k 2 in each orientation i with |i| = i 1 + i 2 + · · · + i d > 1 are exactly equal to 0. Information about the component functions f i in equation (1) can be retrieved via the wavelet coefficient sequence through the geometry of the hyperbolic wavelet basis. This enables the design of specific testing procedures for structural information for multivariate functions, more general than merely testing for additivity. Dalalyan et al. (2014) introduced the atomic dimension in the physical domain that we denote as δ( f ). It reflects the maximal degree of interaction between the d variables in the Sobol decomposition. For instance, the additive model f add has δ( f add ) = 1, while a function f (x 1 , . . . , In the sequel, we use the definition from Autin et al. (2014) of the atomic dimension in the wavelet coefficient domain relating the orientations of the nonzero wavelet coefficients with corresponding Sobol functional components.
Definition 2.1 Let f ∈ L 2 ([0, 1) d ) be decomposed in the hyperbolic wavelet basis as Definition 2.1 gives us more flexibility than its definition in the physical domain. Indeed, the atomic dimension δ B d depends on the number of vanishing moment (even directional ones) of the considered basis B d . More precisely, the atomic dimension δ( f ) of a d-dimensional function f in the physic domain and the atomic dimension δ B d ( f ) of the same function f in the coefficient domain are clearly tied by the following inequality: δ B d ( f ) ≤ δ( f ). We mention here two situations to emphasize the practical importance of being able to characterize the atomic dimension in the coefficient domain. The first example concerns multidimensional function estimation. Autin et al. (2014) introduced a novel estimation procedure based on the thresholding of the hyperbolic wavelet coefficients. They show that it outperforms the standard hard thresholding whenever some structural conditions on the estimand are satisfied. These conditions also include some constraint on the value of the atomic dimension δ B d . Here we provide a test statistic to test the structure of the estimand before applying the aforementioned estimation method. A second example concerns the exploitation of sparsity in statistical modeling that can be useful for building a statistical model using the hyperbolic wavelet coefficients or in generalized linear modeling. For example, Zhou et al. (2013) describe such a generalized linear model with multidimensional covariates, their first step consists in dimension reduction by tensor decomposition. One could instead exploit the sparsity of the hyperbolic wavelet sequence. In such a case, one may seek for the hyperbolic wavelet basis that provides the sparsest situation.

Model and minimax approach for hypothesis testing
The observed signal under the Gaussian white noise model is a realization of the process This version March 22, 2017 is the Brownian sheet and ε is the noise level, considered here to be close to zero. For technical reasons (see the proofs of Proposition 9.3 and Proposition 9.4), without loss of generality, we assume that it is smaller than ε o = e −e , where e denotes the Euler's number. We consider the sequential version of the Gaussian white noise model which consists of the observations of the following random variableŝ where, ξ, ξ i j,k are i.i.d. N (0, 1) and ( j, k) ∈ N d × K j . For any chosen orientation i 0, we may test the the following null hypothesis H i,0 versus one of the two stated alternative hypotheses H i,a and H i,a : In the above expression, C is a positive constant which does not depend on ε and r ε,i , r ε,i are continuous sequences of positive real numbers tending to 0 as ε goes to 0. F s (R) denotes a ball of radius R in a function space characterized by a vector of directional regularities s with harmonic sum in the orientation i denoted as γ −1 i . Given a simultaneous control on the probabilities of type I and II errors, we consider the asymptotic minimax setup: we aim at providing, for any orientation i, the order of the optimal separation rates in the sense of Section 2.6 in Ingster and Suslina (2003) between the null hypothesis H i,0 and each one of the alternative hypotheses H i,a and H i,a associated to a particular choice of F s when: • s is known; • only the harmonic sum γ −1 i = u i u s −1 u of s with respect to orientation i is known. Related to this, we search for both a sequence of F s -minimax optimal testing procedures and a sequence of F s -adaptive minimax optimal testing procedures.
We say that r ε,i corresponds to the F s -minimax rate separating the hypotheses H i,0 and H i,a , up to a constant, if the two following statements are satisfied: 1. (Upper bound) for any C > C i,α , with C i,α an absolute positive constant, there exists a sequence of testing procedures ∆ * ε,α ε such that 2. (Lower bound) for any C < c i,α , with c i,α an absolute positive constant, the following inequality holds The sequence of testing procedures ∆ * ε,α ε is said to be F s -minimax optimal.
Definition 3.2 Let α ∈ 0, 1 2 . Consider i ∈ {0, 1} d such that |i| > 1 and let γ i > 0. We say that r ε,i corresponds to the F s -adaptive minimax rate separating the hypotheses H i,0 and H i,a , up to a constant, if the two following statements are satisfied: 1. (Upper bound) for any C > C i,α , with C i,α an absolute positive constant, there exists a sequence of testing procedure ∆ ε,α ε such that 2. (Lower bound) for any C < c i,α , with c i,α an absolute positive constant, the following inequality holds The sequence of testing procedures ∆ ε,α ε is said to be F s -adaptive minimax optimal.
In the latter situation, we have only the information about the harmonic sum of the directional smoothness parameters of the function class in the alternative. To determine the minimax and adaptive minimax rates, we naturally consider the combination of directional smoothness with a given harmonic sum that is giving the worst type II error rate. The fully adaptive case corresponds to an unknown value γ i . It is a direct extension studied here which could be called partially adaptive, that is to say when γ i is known. In this work for the sake of simplicity we use adaptive instead of partially adaptive.
Remark 3.1 The definition of minimax and adaptive minimax rates are equivalent to the ones given in Section 1 of Lepski and Tsybakov (2000) (Section 1) and slightly more precise than the ones given in Section 2.6 of Ingster and Suslina (2003). We will show that both the minimax rate and the adaptive minimax rate of testing depend on the smoothness of the underlying functional space and on the noise level ε. At the end of the appendix, we highlight the relationships between the absolute constants C i,α , c i,α , C i,α , c i,α and the parameters appearing in the problem such as R and the smoothness.

Nonparametric tests for a given orientation
In this paragraph we present test statistics and testing procedures that are used in the sequel. These test statistics are going to provide sequences of testing procedures that are minimax and adaptive minimax optimal in the particular case where F s corresponds to a Besov space with s as smoothness parameter.
Definition 4.1 (Test statistic) Let i 0 and j = ( j 1 , . . . , j d ) ∈ J i . The nonnegative random variable is called the i-oriented test statistic with level parameter j.
4.1 B s 2,∞ -minimax optimal sequence of testing procedures We define appropriate test statistics based on the estimated empirical energy that is the sum of the squared values of the empirical hyperbolic wavelet coefficients within the specified orientation over certain scales. We show that this sequence of tests constructed using hyperbolic wavelet bases gets optimal minimax properties when f belongs to a Besov ball with smoothness parameter s.
In this section we assume that we have the full knowledge about the smoothness parameter s of the functions in the alternative. In other words it corresponds to the non adaptive set up. We define a vector j * = ( j * 1 , . . . , j * d ) ∈ J i of truncation scales such that Theorem 4.1 Let α ∈ 0, 1 2 , R > 0, s in (0, +∞) d and i ∈ {0, 1} d \ 0. Consider, as in (3) and (4), testing the hypotheses H i,0 versus H i,a where F s (R) = B s 2,∞ (R). Then the sequence of the (i, j * , t ε,i,α )-testing procedures ∆ i, j * (t ε,i,α ) ε is B s 2,∞ -minimax optimal with j * as in (6), and with ε −2 t ε,i,α = χ 1−α/2 (2 | j * | ), the 1 − α/2 quantile of the Chi-Square distribution with 2 | j * | = 2 j * 1 +···+ j * d degrees of freedom. Moreover the B s 2,∞ -minimax rate separating H i,0 and H i,a is of order of The proof of Theorem 4.1 is stated in the appendix. The absolute constants C i,α and c i,α from Definition 3.1 will be given there.
This theorem can be compared to the results given in the book of Ingster and Suslina (2003) for Besov bodies in the one-dimension case. In this book, Theorem 3.9 states distinguishability conditions and sharp asymptotics are given in Theorem 4.4. The results stated above in Theorem 4.1 is similar but in any dimension; therefore the smoothness parameter is replaced by the average smoothness value obtained from the harmonic sum.
The reader could note that the worst rate is associated with γ 1 with 1 = (1, . . . , 1), i.e. to the interaction term of highest degree. This case is similar to the one considered in Ingster and Stepanova (2011) (Remark 3.3). We obtain the same minimax rate of testing hypotheses involving the harmonic sum of smoothness as the one they found for a signal function f in a Sobolev space. Nevertheless we can point that their null hypothesis H 0 : f = 0 is different from the one considered here and briefly reformulated as H 1,0 : f 1 = 0 which consists of functions with an atomic dimension strictly smaller than the dimension d. The separation rate can be different for every orientation. When it comes to test for the structure of a d-dimensional function, a sequential testing procedure based on these different test statistics seems to be ideal because one can benefit from better separation rates.
Remark 4.1 In the context of multidimensional function estimation, the maxiset approach introduced by Kerkyacharian and Picard (2000) has been proved useful to study the performance of wavelet-based estimators (Autin et al., 2014). It consists in determining the largest function space such that the risk of an estimator is controlled at a chosen rate. If the choice of the rate is often an issue, it is common to use the minimax or near-minimax rates over some 'standard' functional spaces F . Then the maxiset approach reveals a set of functions that contains F strictly or not. This is an optimistic approach in the sense that finally, we can estimate at the minimax or near-minimax rate more functions that previously thought. In minimax hypothesis testing problems, we can get inspired from the maxiset approach to remark that under the alternative, we naturally model the 'estimand' as a function in a d-variate smoothness space, here B s 2,∞ (R). Nevertheless, since we are only concerned by the presence of information along given directions, the behavior along the directions that are not of interest can be let 'uncontrolled'. Hence, the sequence space in the alternative can be enlarged to the set of all Besov spaces B We now suppose that the regularity parameter s appearing in the alternative hypothesis is unknown but its harmonic sum γ −1 i = d u=1 i u s −1 u is known for a chosen i such that |i| > 1. In some sense, we consider now the adaptive setup.
To define the test statistic using the knowledge about γ, we denote by J i the integer such that Theorem 4.2 Let α ∈ 0, 1 2 , R > 0, i ∈ {0, 1} d satisfying |i| > 1 and let γ i > 0. Consider, as in (3) and (5), testing the hypotheses The proof of Theorem 4.2 is stated in the appendix. The absolute constants C i,α and c i,α from Definition 3.2 will be given there.
Results given for adaptation in Theorem 4.2 can be compared to usual results obtained for adaptation in hypothesis testing problems. The unavoidable loss due to adaptation is the usual one that can also be found for example in Theorem 7.2 in Ingster and Suslina (2003). The results stated above in Theorem 4.2 can be compared to the loss obtained by Spokoiny (1998) in the Gaussian white noise model in one-dimension for the signal detection problem.
Remark 4.2 Note that compared to the B s 2,∞ -minimax case, the optimal separation rate is slower because log log ε −1 → ∞ as ε → 0. This attests of the deterioration of the information about the function space in the alternative. Nevertheless, in the alternative hypothesis, the union of sequence spaces B s 2,∞ (R) can be replaced by another larger sequence space, denoted A γ i ,2 (R) and defined hereafter.
The truncation ball A γ i ,2 (R) is similar to a Besov ball in the sense that it controls the decay of the energy of the hyperbolic wavelet coefficients over the scales and hence it is used to control the approximation error. Following Lemma 2.2 of Neumann (2000), the following inclusion property holds.
Considering the functional component f i of f does not only enable us to explore fine structures but it is also a way to improve the performance of the testing procedures for more general structure, see Section 6.

Tests that combine different orientations
Many interesting hypotheses on the structure of a function can be tested by 'aggregating' the previously described optimal testing procedures for the Sobol functional components. In this paragraph we describe such results in the context of the B s 2,∞ -minimax framework, but it can be easily given for the B s 2,∞ -adaptive minimax setting as well. We first state the following proposition: such that |I| denotes the cardinality of a nonempty set of orientations I that characterizes the alternative hypothesis.
Consider the following testing hypotheses Then, for any ε ∈ (0, ε o ), the testing procedure Λ I,ε,α = max i∈I ∆ i, j * (t ε,i,α ) is β-level, i.e. its probability of a type I error is equal to β.
The proof of Proposition 5.1 is an immediate consequence of Theorem 4.1. Since the testing procedures involved are independent for different orientations, the limiting distributions can be exactly computed and we do not need the probably conservative FWER. For any ε ∈ (0, ε o ), Remark 5.1 Under the alternative, we naturally model the estimand as a function in a dvariate smoothness space, here and once again a B s 2,∞ (R). Nevertheless, inspired from Remark 4.1, we can state another analogy with the maxiset approach. Let us consider the testing problem with the same hypotheses as in (8) but with a rate that is chosen to be the worst optimal one with respect to s, that is r ε,1 = ε 4γ/(1+4γ) . Since we are only concerned by the presence of information along at least one given direction that belongs to I, no assumption on the smoothness is needed in the other orientations. Hence, the sequence space in the alternative associated with the chosen rate r ε,1 can be considered larger than the initial one, B s 2,∞ (R). More precisely it could be replaced by the sequence space defined by the union of all Besov balls B s 2,∞ (R) with s ∈ (0, +∞) d such that u i u s u = γ −1 for at least one i ∈ I.
In what follows, we present some examples where the optimal testing procedures that combine different orientations could be useful.

Testing the structure of the goal function
Take the set I(δ 0 ) = {i ∈ {0, 1} d : |i| > δ 0 } that contains all orientations for which the number of contributing components of the d-vector is strictly larger than a specified value δ 0 . The corresponding hypotheses for testing for the atomic dimension are: Tests for the atomic dimension are more versatile than merely testing for additivity, the latter which is a special case with δ 0 = 1. Concluding that a structure is simpler than a full ddimensional object leads to an efficiency gain in estimation by using the appropriate methods escaping (at least partly) the curse of dimensionality when δ( f ) is smaller than d. Combining tests over multiple orientations i ∈ I can be done by computing the maximum of testing procedures in the separate orientations. From our testing procedures ∆ i, j * t ε,i,α , i 0, there is a quite natural way to estimate the structure of the goal function f by considerinĝ as an estimator of A f . Nevertheless this estimator coud lead to many false discoveries that are any orientation i ∈ A f,t ε,i,α such that f i = 0. Obviously the larger α, the bigger the number of expected false discoveries. There is also a way to naturally get an estimatorδ of the atomic dimension δ( f ) of a ddimensional signal f by putting, for a chosen α ∈ 0, 1 2 , Rejecting a combined orientations null hypothesis as in (9) whenδ B d ≥ δ 0 is equivalent with rejecting H 0 when Λ I(δ 0 ),ε,α = 1.

Reduction of model: selection of variables
Another application is to test whether some variables among x 1 , . . . , x d may be omitted from the model or not. Consider the case of leaving out variable x m for some m ∈ {1, . . . , d}. In this In addition to possibly reduce the model, when considering such a testing problem could find applications in modeling time series (see Section 6 for details).
Following the two previous paragraphs, an interesting point must be underlined here. Estimating the atomic dimension of a d-dimensional object or reducing the number of its relevant variables can be considered as a first step to guarantee that the object is a function of a number of variables smaller than d (usually called 'sparse variable function'). This assumption was often assumed in recent works in estimation problem as in Autin et al. (2014) and in testing problem as in Ingster and Suslina (2015). From the data, our methodology also permits to test whether this assumption is reasonable or not.

Numerical experiments
In this section we first describe a practical implementation of the adaptive method. We present its performance in terms of empirical power curves w.r.t signal to noise ratio for testing the atomic dimension of some multidimensional functions corrupted by an additive Gaussian white noise. Then, we propose an application in time series analysis that consists in exploring the structure of the spatio-spectrum of spatio-temporal processes.

Data-driven structure detection
To link our theory to the practical setting we refer the reader to Reiss (2008) for the equivalence in the Le Cam's sense between the multidimensional nonparametric regression problem (11) and the Gaussian white noise model (2). This equivalence is assumed to hold and typically relies on some minimal regularity conditions and an appropriate calibration of the noise level ε = σN − d 2 . Let ζ l 1 ,...,l d be i.i.d. N(0, 1), Both, the B s 2,∞ -minimax and B s 2,∞ -adaptive minimax optimal procedures studied previously truncate the wavelet coefficient sequence at scales that are calibrated based on the knowledge of the unknown smoothness of the estimand (based either on the knowledge of all the directional regularities or only of their harmonic sum). Following the idea of Sieve estimation described in Birgé and Massart (1998), we build a data-driven version of the adaptive testing procedure. In this numerical part, we consider only the B s 2,∞ -adaptive minimax optimal method since its algorithmic complexity is of about O log 2 (N) compared to O log 2 (N) |i| for the B s 2,∞ -minimax optimal method. Let us consider the sets of all possible adaptive models F M A N based on ε −2 observations, An oracle choice of the nuisance parameters leads to the following optimization problem The optimizer is found by solving the following problem for every i ∈ {0, 1} d \{0}, In practice, we plug in empirical quantities and adjust for the variability in the data proposing the following optimization, withλ =σ 2dN −d log N the universal threshold,

Empirical power functions
In this section we consider 3-dimensional functions f and we test for their atomic dimension H 0 : δ ( f ) = 1 versus H 1 : δ ( f ) > 1. Therefore we consider the hyperbolic Haar wavelet basis so that δ B d ( f ) = δ ( f ). For sake of simplicity, we generate them using the Sobol decomposition of a d-variate function f ∈ L 2 ([0, 1) d ) into orthogonal summands of growing dimensions, The marginal functional components are chosen to be univariate functions that are frequently considered in the wavelet literature (Antoniadis et al., 2001). The interaction terms are obtained as weighted product of the marginal functions. In this experiment we choose the following test functions we describe below. We consider the null hypothesis with δ ( f ) = 1 and two functions in the alternative with atomic dimension δ ( f ) = 2 or δ ( f ) = 3.

15
• under H 0 : Here, Figure 1 gives an illustration of some of these test functions. The total energy of these functions f (x 1 , x 2 , x 3 ) is normalized. Then the observed data are generated by adding Gaussian white noise to the test functions. The empirical power is computed as a function of the SNR as follows: where the signal to noise ratio (SNR) is defined as the ratio of the standard deviation of the function values to the standard deviation of the noise. Λ (δ 0 ) m,S NR j is the result of the procedure Λ I(δ 0 ),ε,α (with δ 0 ∈ {1, 2, 3}) at the S NR j and m−th Monte Carlo iteration. We generate these data sets with sample sizes N ∈ {32, 64}, 1 ≤ j ≤ 50 and with M = 100 Monte Carlo replications at every values of S NR j . The results are summarized in Figure 2.  Figure 2 shows that for both test functions, the nominal level of the test at α = 5% under the null hypothesis H 0 is maintained over the different values of the SNR. Under the alternative, we observe firstly that the detection appears at very low SNR values, secondly, an increase in the sample size improves the detection power of the method, and thirdly, under the two different alternatives, when the energy of the function is more spread over orientations (δ ( f ) = 3), the detection power decreases. These tests confirm the proper calibration of the data-driven adaptive method. We can now apply it to a more realistic data example. Neumann and von Sachs (2000) consider testing the stationarity of a time series by studying the structure of its time-varying spectrum. They form local contrast test statistics in the time direction also based on hyperbolic wavelet coefficients of an estimator of the time-varying spectral density, namely the preperiodogram. A crucial problem for practical application of this method in testing stationarity is due to the inherent nature of the preperiodogram data which suffers from interferences, very low SNR and non Gaussian noise distribution. This may strongly affect the performances of the data-driven choice of the truncation scales. Adapting our testing methodology to this context is an interesting research problem in itself. In the sequel, we test the structure of the spatio-spectrum of spatio-temporal processes that are time stationary. A specific concern for this application is the high dimensional setting. We face the problem of having small p-values due to a large number of observations (Lin et al., 2013). We illustrate how to properly use our results to assess the structure of such data. We first define a class of spatio-temporal processes, its spatio-spectrum, explain how to estimate this and finally we apply our data-driven method to simulated data.

Spatio-temporal model
Following Ombao et al. (2008), we define the Cramér representation of a spatio-temporal process X t u ; u = (u 1 , u 2 ) ∈ [0, 1) 2 , 1 ≤ t ≤ T , where u is the spatial index, as follows: where Z (ω) is a complex-valued stochastic process with zero mean and orthogonal increments, which satisfies where Dirac(.) means the Dirac function at mass point 0 and A u, λ is the spatial complexvalued transfer function. It is Hermitian A (., −ω) = A * (., ω), where A * is the adjoint operator. The location-dependent spectrum is given by f u, ω := A u, ω 2 .

Estimation
In a practical setting we observe the spatio-temporal process over a discrete grid of dimensions (N × N × T ). Since we assume it to be time stationary, we can compute the bias-corrected log-periodogram at every spatial location where κ = 0.57721 is the Euler-Mascheroni constant, as in Wahba (1980). From these spatial log-periodograms we test the structure of the spatio-spectrum f u, ω . Hereafter we directly apply our methodology without adaptation for the non Gaussian nature of the log-periodogram ordinates. The central limit theorem in the coefficient domain permits to consider that the wavelet coefficients are approximately normally distributed up to comparatively fine scales. The actual procedure could be improved in that specific context, for example by adapting the methodology of Gao (1997) in the penalization of the data-driven selection of the truncation scales. Nevertheless, omitting this adaptation helps us to illustrate the practical application of our method considering a slightly suboptimal procedure.
In the previous section we checked that our method is well-calibrated. In this time series example, we are away from this idealistic framework. We are now dealing with periodogram data that are non Gaussian and the spatio-spectrum may have a complicated structure (in terms of energy along the different orientations). In this high-dimensional and more realistic context, the p-values have not longer any practical importance. For (very) large sample sizes, p-values are almost always extremely small, suggesting the rejection of the null hypothesis since, with real life data, the situation of total absence of information in some orientation is rarely exactly true. A solution in this context is to use our test statistic values to describe the proportions to the total energy of the function that are in the different orientations, similarly as in principal components analysis. This is the approach adopted hereafter for time series data examples.

Results
We generate the following time series models using a discretized version of their Cramér representation given by equation (14). We consider two spatial AR(1) processes, i.e, their spatiospectrum is given by Let 'Model 1' be a constant parameter over space, i.e., φ u = φ = 0.9, ∀u ∈ [0, 1] 2 , and 'Model 2' to have a complex spatial AR(1) structure over space as illustrated in Figure 3.

Discussion
In this paper we describe how the Sobol decomposition in relation to the geometry of the multidimensional hyperbolic wavelet basis allows to build simple test statistics with impressive theoretical performance in the presence of anisotropic estimand. Appropriately combined, these tests statistics allow to test for general structural properties such as the atomic dimension. We determine the minimax rates for separating the null and alternative hypothesis for detecting the Sobol functional component in cases of full or partial knowledge about the smoothness parameter. We propose two sequences of testing procedures that are based on different knowledges of the smoothness parameter (full or partially known) and we show that they are asymptotically optimal in the minimax sense. Interestingly we observe on the one hand that a loss of information about the smoothness parameter is accompanied with a deterioration of the minimax rate of separation; on the other hand the set of functions in the alternative hypothesis is larger. Finally, we describe a methodology for a data driven version of the testing procedure and its application to time series data. In the context of time series analysis, we consider extending our application example to deal with spatial and time non-stationary processes. We are therefore interested in spatial timevarying spectrum of these processes. We can compute at every spatial locations an estimator of the time-varying spectrum such as the preperiodogram (Neumann and von Sachs, 2000). Neumann and von Sachs (1997) provide arguments about the asymptotic equivalence of the preperiodogram estimation problem, the Gaussian white noise model and the multidimensional nonparametric regression model so that our theory actually can easily be extended to this context. Nevertheless, in a practical setting, the nature of the preperiodogram data requires to develop an adapted methodology for choosing the truncation scales. In combination with the fully adaptive testing procedure, this method certainly encounters success for practical applications. Already in the context of estimation of the time-varying spectral density, the adaptive smoothing of the preperiodogram is still an actual research topic (van Delft and Eichler, 2015).

Acknowledgments
The The proof of Theorem 4.1 is a direct consequence of Propositions 9.1 and 9.2 that respectively deal with the upper bound and the lower bound of the minimax result.
Proposition 9.1 Under the same assumptions of Theorem 4.1 and for any C > C i,α , the following inequalities hold Remark 9.1 The constant C i,α corresponds to the absolute positive constant C i,α appearing in Definition 3.1 and will be given at the end of the proof below.

21
By considering the Cramér-Chernoff method (see Chapter 2 of Massart, 2003), we easily obtain the following concentration inequality for the noncentral Chi-Square distribution with 2 | j * | degrees of freedom, So, .
Remark that f i belongs to B s 2,∞ (R) because f does. Since f i 2 2 ≥ C 2 r 2 ε,i and χ 1− α 2 2 | j * | ≤ 2 | j * | + 2 2 | j * | log(2α −1 ) 1 2 (see the exponential inequality for Chi-Square distribution given in Lemma 1 of Laurent and Massart, 2000), one gets, ≤ log 2α −1 −1 provided that C ≥ C i,α . The constant C i,α is the largest value of C such that the last inequality can be replaced by an equality. That leads to P f ∆ i, j * t ε,i,α = 0 ≤ α 2 for any C large enough and uniformly in f .
Note that the smaller α the larger C i,α . This remark is not a surprise obviously because when α is chosen to be small, the problem of detection becomes harder. Nevertheless the order of the minimax rate does not depend on α. We also note that (15) is obtained because we assumed that ε < e −e .
Proposition 9.2 Under the same assumptions of Theorem 4.1 and for any C < c i,α , the following inequality holds Remark 9.2 The constant c i,α corresponds to the absolute positive constant c i,α appearing in Definition 3.1 and will be given at the end of the proof below.
Proof: For any ε ∈ (0, ε o ) let us consider where L > 0 and ζ k ∈ {−1, 1} for any k. The values j * u are chosen as in the upper bound. We recall that First we check that f ζ is in the alternative, that is to say it belongs to the Besov ball of radius R with the expected parameter of regularity and it is separated from 0 sufficiently. According to the definition of the Besov ball with radius R, we have to check that Because of our choice of θ i j * ,k , we have A first condition on the constant L is that it has to be chosen smaller than R √ 2 minu su . It ensures that f ζ belongs to the Besov ball under consideration. Next we compute the square of the L 2 -distance between f ζ and 0. We have

This version March 22, 2017
23 Therefore L has to be larger than 2 |i|γ i C. This kind of choice guarantees that f ζ 2 ≥ Cr ε,i Following Propositions 2.11 and 2.12 of Ingster and Suslina (2003), we have that: where π is any prior probability measure concentrated on the set of alternatives A i R, C, s, r ε,i and Therefore, according to (16), it suffices to prove that for judicious choices of π we can get E E π L f ζ 2 < 4(1 − α) 2 + 1.
We consider now the probability measure π such that the ζ k 's are independent Rademacher random variables with parameter 1 2 . Note that As a first step we take the expectation according to the prior probability measure π.
E π L f ζ = exp As a second step we take the expectation according to the independent standard Gaussian random variables ξ i j * ,k : The last inequality is obtained by choosing L < min 2 log(1 + 4(1 − α) 2 ) 1 4 , R √ 2 minus u . So (17) is proved, when considering C small enough, that is C ≤ c i,α = 2 −|i|γ i L.

Proof of Theorem 4.2
The proof of Theorem 4.2 follows the same path as the proof of Theorem 4.1. It is a direct consequence of Proposition 9.3 and Proposition 9.4 that respectively deal with the upper bound and the lower bound of the minimax result. Remark 9.3 The constant C i,α corresponds to the absolute positive constant C i,α appearing in Definition 3.2 and will be given at the end of the proof below.
Proof: Consider ε ∈ (0, ε o ). Suppose that f ∈ N i (R). Then, for any j ∈ J i , ε −2 T i, j has a Chi-Square distribution with 2 | j| degrees of freedom. Hence Suppose now that f ∈ A i (R, C, s, r ε,i ) where s : u i u s −1 u = γ −1 i . Then, for any fixed j ∈ J i such that | j | = J i , ε −2 T i, j is a noncentral Chi-Square distribution with 2 J i degrees of freedom.
Hence, if V i j characterizes the following linear span of ψ i j,k : j ∈ J i : j u < max( j u , 1), ∀u and k ∈ K j , then, ≤ log 2α −1 −1 provided that C ≥ C i,α . The constant C i,α is the value of C for which the last inequality can be replaced by an equality. That leads to P f ∆ i,max t ε,i,γ i ,α = 0 ≤ α 2 for any C large enough and uniformly in f . We also note that (19) is obtained because we assumed that ε < e −e .
Note, once again, that the smaller α the larger C i,α .
Proposition 9.4 Under the same assumptions of Theorem 4.2 and for any C < c i,α , the following inequality holds Remark 9.4 The constant c i,α corresponds to the absolute positive constant c i,α appearing in Definition 3.2 and will be given at the end of the proof below.
Proof: For any ε ∈ (0, ε o ) let us consider the family of d-uple J = j ∈ J i : | j| = J i . Remember that: