Goodness-of-fit testing the error distribution in multivariate indirect regression

We propose a goodness-of-fit test for the distribution of errors from a multivariate indirect regression model. The test statistic is based on the Khmaladze transformation of the empirical process of standardized residuals. This goodness-of-fit test is consistent at the root-n rate of convergence, and the test can maintain power against local alternatives converging to the null at a root-n rate.


Introduction
A common problem faced in applications is that one can only make indirect observations of a physical process. Consequently, important quantities of interest cannot be directly observed, but a suitable image under some transformation is typically available. These problems are called inverse problems in the literature. Loosely speaking, the goal is to recover a quantity θ (often a function) from a distorted version of an image Kθ, where K is some operator. Developing valid statistical inference procedures for these inverse problems is desirable, and in recent years several authors have worked on the construction of estimators, structural tests, and (pointwise and uniform) confidence bands for the unknown indirect regression function θ [see Mair and Ruymgaart (1996), Cavalier where X j is a predictor, ε j is a random error and K is a convolution operator, which will be specified later (along with the covariates X j ). Here θ is an unknown but square-integrable smooth function. We study a unified approach to testing certain model assumptions regarding the distribution function of the error ε j in the indirect regression model (1.1).
Apart from specification of the operator K, many statistical techniques used in applications for the estimation of θ depend on the error distribution. For example, when recovering astronomical images certain defects such as cosmic-ray hits are important to identify and remove [Section 6 of Adorf (1995)]. Here deviation values between observations from pixels and an initial reconstruction are calculated and compared with the standard deviation of the noise. A large deviation indicates the presence of a possible cosmic-ray hit, and observations from the affected pixels are discarded (or replaced by imputed values) in subsequent iterative reconstruction procedures that improve the quality of the final reconstructed image. Determining an unrealistic deviation depends on the structure of the noise distribution. More recently, Bertero et al. (2009) review maximum likelihood methods for reconstruction of distorted images, and, in their Section 5.2 on deconvolution using sparse representation, these authors note the popularity of assuming an additive Gaussian white noise model for transformed data. However, it is not known in advance whether this transformation is appropriate for a given image. If the transformation is inappropriate, then we can expect the Gaussian white noise model to also be inappropriate. The purpose of this paper is to help in answering some of these questions, which could be considered as goodness-of-fit hypotheses of specified error distributions.
Problems of this type have found considerable interest in direct regression models (this is the case where K is an identity operator and only θ appears in (1.1)) [see Darling (1955), Sukhatme (1972) or Durbin (1973) for some early works or del Barrio et al. (2000) and Khmaladze and Koul (2004) for more recent references]. However, to the best of our knowledge the important case of testing distributional assumptions regarding the error structure of an indirect regression model of the form (1.1) has not been considered so far. We address this problem by proposing a test, which is based on the empirical distribution function of the standardized residuals from an estimate of the regression function. The method is based on a projection principle introduced in the seminal papers of Khmaladze (1981Khmaladze ( , 1988. This projection is also called the Khmaladze transformation and it has been well-studied in the literature. Exemplarily, we mention the work of Marzec and Marzec (1997), Stute et al. (1998), Koul (2004, 2009) Can et al. (2015), who use the Khmaladze transform to construct goodness-fit-tests for various problems. The work which is most similar in spirit to our work is the paper of Koul et al. (2018), who consider a similar problem in linear measurement error models.
We prefer the projection approach because there is a common asymptotic distribution describing the large sample behavior of the test statistics (without unknown parameters to be estimated) and the procedure can be easily adapted to handle different problems. To obtain a better understanding of projection principles as they relate to forming model checks, we direct the reader to consider the rather elaborate work of Bickel et al. (2006), who introduce a general framework for constructing tests of general semiparametric hypotheses that can be tailored to focus substantial power on important alternatives. These authors investigate a so-called score process obtained by a projection principle. Unfortunately, the resulting test statistics are generally not asymptotically distribution free, i.e. the asymptotic distributions of these test statistics generally depend on unknown parameters and inference using them becomes more complicated. The Khmaladze transform is simpler to specify and easily employed in regression problems, since test statistics obtained from the transformation are asymptotically distribution free with (asymptotic) quantiles immediately available.
The article is organized as follows. A brief discussion of Sobolev spaces and their appearance in statistical deconvolution problems is given in Section 2. In this section we further propose an estimator of the indirect regression function and study its statistical properties. The proposed test statistic is introduced in Section 3. Finally, Section 4 concludes the article with a numerical study of the proposed testing procedure and an application. The technical details and proofs of our results can be found in Section 5.

Estimating smooth indirect regressions
Consider the model (1.1) with the operator K specifying convolution between an unknown but smooth function θ and a known distortion function ψ that characterizes K, i.e.
Here the covariates X j are random and have support C = [0, 1] m for some m ≥ 1. The model errors ε 1 , . . . , ε n are assumed to be independent with mean zero and common distribution function F admitting a Lebesgue density function, which is denoted by f throughout this paper. We also assume that ε 1 , . . . , ε n are independent of the i.i.d. covariates X 1 , . . . , X n . Throughout this article we will assume that the indirect regression function θ from (1.1) is periodic and smooth in the sense that θ belongs to the subspace of periodic, weakly differentiable functions from the class of square integrable functions L 2 (C ) with support C ; see Chapter 5 of Evans (2010) for definitions and additional discussion. For d ∈ N let I(d) be the set of multi-indices i = (i 1 , . . . , i m ) satisfying i • = i 1 + · · · + i m ≤ d. To be precise, we will call a function q ∈ L 2 (C ) weakly differentiable in L 2 (C ) of order d when there is a collection of functions for every infinitely differentiable function ϕ, with ϕ and D i ϕ, i ∈ I(d), vanishing at the boundary of C and writing The class of weakly differentiable functions from L 2 (C ) of order d forms the Sobolev space The periodic Sobolev space W d,2 per are those functions from W d,2 that are periodic on C and whose weak derivatives are also periodic on C . An orthonormal basis for the space L 2 (C ) of square integrable functions is given by the Fourier basis {e i2πk·x : x ∈ C } k∈Z m . Here k·x = k 1 x 1 +· · ·+k m x m is the common inner product between the vectors k = (k 1 , . . . , k m ) ∈ Z m and x = (x 1 , . . . , x m ) ∈ C . It follows that W d,2 per can be equivalently represented by where · denotes the Euclidean norm and are the Fourier coefficients of q [see Kühn et al. (2014) for further discussion]. The series in the equivalent representation of W d,2 per motivates replacing the degree of weak differentiability d by a real-valued smoothness index s > 0. Throughout this article we work with the general indirect regression model space M(s) defined as We will assume that θ ∈ M(s 0 ), for some s 0 specified below, and that ψ ∈ L 2 (C ) such that ψ is positive-valued and integrates to 1 so that K is a convolution operator from L 2 (C ) into L 2 (C ). In this case we can represent Kθ in terms of a Fourier series where {R(k)} k∈Z m and {Θ(k)} k∈Z m are the Fourier coefficients of Kθ and θ, respectively. In particular we have Studying the indirect regression model (1.1) requires that we consider the ill-posedness of the inverse problem. This phenomenon occurs because the ratio |R(k)|/|Ψ(k)| needs to be summable when θ ∈ M(s). However, when estimated Fourier coefficients {R(k)} k∈Z m are used |R(k)| does not asymptotically vanish (with increasing k ) due to the stochastic noise from the errors ε j in model (1.1). Consequently, the ratio |R(k)|/|Ψ(k)| is not necessarily summable, and this problem is therefore called ill-posed. We can see that the coefficients {Ψ(k)} k∈Z m determine the rate at which the ratio |R(k)|/|Ψ(k)| expands, and, therefore, the ill-posedness of the inverse problem here is given by the rate of decay in the coefficients {Ψ(k)} k∈Z m of the distortion function ψ. We will assume that the inverse problem is mildly to moderately ill-posed in the sense of Fan (1991): Assumption 1. There are finite constants b ≥ 0, γ > 0 and 0 ≤ C Ψ < C * Ψ such that, for every k > γ, the Fourier coefficients {Ψ(k)} k∈Z m of the function ψ in (2.1) satisfy Under Assumption 1, whenever θ ∈ M(s 0 ), for some s 0 > 0, it follows that Kθ ∈ M(s 0 + b) from the celebrated convolution theorem for the Fourier transformation. This means that convolution of the indirect regression θ with the distortion function ψ adds smoothness, and the resulting distorted regression function Kθ is now smoother than θ by exactly the degree of ill-posedness b of the inverse problem. Note that Assumption 1 is milder than that of Fan (1991) in the sense that we allow the degree of ill-posedness b = 0 and that the scaled Fourier coefficients can vanish. This covers the case of direct regression models where K is the identity operator, that is Kθ = θ. Further note that we do not have to invert the operator K in order to investigate properties of the error distribution in the indirect regression model (1.1).
Several techniques have been developed in the literature to derive series-type estimators (see, for example, Cavalier, 2008). A popular regularization method to employ is the so-called spectral cut-off method, where an indicator function is introduced in (2.3). For example, the indicator function 1[ c n k ≤ 1] (for some sequence {c n } n≥1 converging to 0) results in a biased version of Kθ: The proposed estimator is obtained by replacing the coefficients {R(k)} k∈Z m with consistent estimators {R(k)} k∈Z m , which gives as an estimator of (Kθ) n . The sequence of smoothing parameters {c n } n≥1 is chosen such that Kθ is consistently estimated. We can generalize this approach as follows.
Following Politis and Romano (1999) we consider a Fourier smoothing kernel Λ, where Λ is defined to be the Fourier transformation of some smoothing kernel function, say L Λ . The resulting estimate is then defined by Another useful observation that Politis and Romano (1999) make is the function x → c −m n L Λ (c −1 n x) has Fourier coefficients {Λ(c n k)} k∈Z m . Throughout this paper we will choose Λ as follows: Assumption 2. The Fourier smoothing kernel Λ satisfies Λ(k) = 1, for k ≤ 1, |Λ(k)| ≤ 1, for k > 1, and R m u |Λ(u)| du < ∞. The random covariates X 1 , . . . , X n from model (1.1) are assumed to be independent with distribution function G. For simplicity we will assume that G satisfies the following properties. Assumption 3. Let the covariate distribution function G admit a positive Lebesgue density function g ∈ L 2 (C ) satisfying inf x∈C g(x) > 0, sup x∈C g(x) < ∞ and that g ∈ M(s) for some s > 0.
The boundedness assumptions taken for g are common in nonparametric regression because these conditions guarantee good performance of nonparametric function estimators. The last condition ensures that the density function g satisfies similar smoothness properties as the indirect regression function θ, which allows us to use a Fourier series technique to specify a good estimator of g (see, for example, Politis and Romano, 1999).
What remains is to define the estimates {R(k)} k∈Z m of the Fourier coefficients {R(k)} k∈Z m required in the definition (2.5). Observing the representation the covariate density function g must be estimated. For this purpose we the expand the density function g into its Fourier series using the coefficients Estimators of these coefficients are given bŷ From these estimators we then obtain an estimatorĝ of the unknown covariate density function g, that is with smoothing weights Here (as before) the choice of Λ defines the form of the smoothing weights W cn . The sequence {c n } n≥1 of smoothing parameters is specified later. We now propose to estimate the Fourier coefficients {R(k)} k∈Z m of the distorted regression function Kθ byR where the density estimatorĝ is specified in (2.6). This gives for the nonparametric Fourier series estimator in (2.5) the representation where the smoothing weights W cn are defined in (2.7). The results of Lemma 2 in Section 5 show that the consistency of the estimated Fourier coefficients {R(k)} k∈Z m is heavily dependent on the consistency of the covariate density estimator g. This fact motivates our choice of smoothing parameters as (2.9) c n = O n −1/(2s 0 +2b+3m) log 1/(2s 0 +2b+3m) (n) and requiring that the covariate density function g has a smoothness index s = s 0 + b + m in Assumption 3, where s 0 is the smoothness index of the function class M(s 0 ) to which θ belongs, b is the degree of ill-posedness of the inverse problem and m is the dimension of the covariates. Our first result result establishes the uniform consistency of the estimator Kθ in (2.5) and a further technical metric space inclusion property that is useful for working with residual-based empirical processes.
Theorem 1. Let θ ∈ M(s 0 ) for some s 0 > 0 and let Assumption 1 hold for some degree of ill-posedness b ≥ 0. Let Assumption 2 hold for a Fourier smoothing kernel Λ that satisfies R m u max{s 0 +b,1} |Λ(u)| du < ∞. Further let Assumption 3 hold for s = s 0 + b + m and assume that the errors ε 1 , . . . , ε n have a finite absolute moment of order κ > 2. Choose the smoothing parameter c n as in (2.9). Then

Goodness-of-fit testing the error distribution
In this section we consider the problem of goodness-of-fit testing of a location-scale distribution of the errors in the indirect regression model (1.1) with convolution operator (2.1). Here the location parameter is the mean of the errors and equal to zero, but the scale parameter is unknown. The null hypothesis is given by where f * is a specified density function of the standardized error distribution and σ is the unknown scale parameter. To simplify notation we write f σ for the density function of the standardized errors Z j = ε j /σ (j = 1, . . . , n) and F σ (t) = t −∞ f σ (y) dy (t ∈ R) for the corresponding distribution function. With this notation the null hypothesis in (3.1) becomes for the error distribution function specified by the null hypothesis.
Following Müller et al. (2012), who consider a similar problem in the direct case, we propose to use the standardized residualsẐ to form a suitable test statistic, whereε j = Y j − Kθ(X j ) (j = 1, . . . , n) are the residuals in the indirect regression model (1.1) obtained for the estimate (2.8) and is a consistent estimator of the scale parameter σ. A nonparametric estimator of F * is given by the empirical distribution function of these standardized residuals, The null hypothesis H 0 is then rejected if a given metric between the estimated standardized distribution functionF and F * is large enough. A popular metric in the literature is the supremum metric, and this leads to the Kolmogorov-Smirnov test statistic: Critical values for the Kolmogorov-Smirnov test statistic are then determined from asymptotic theory, but these can be difficult to work with in practice because they depend on F * . To avoid this problem, we will work with a different test statistic.
Our proposed test statistic will crucially depend on the estimatorF satisfying an asymptotic expansion, which is given in the following result.
Theorem 2. Let the assumptions of Theorem 1 hold, with s 0 + b > 3m/2 and assume that the Fourier smoothing kernel Λ is radially symmetric. Let F * have a finite absolute moment of order 4 or larger and a bounded Lebesgue density f * that is (uniformly) Hölder continuous with exponent 3m/(2s 0 + 2b) < γ ≤ 1. Finally, the function t → tf * (t) is assumed to be uniformly continuous and bounded. Then under the null hypothesis (3.1) to a Gaussian process, which is also the weak limit of the stochastic process This limit distribution can be easily simulated. However, it is clearly not distribution free because it depends on F * and f * specified in the null hypothesis.
In order to obtain a test statistic whose critical values are independent from the distribution specified in the null hypothesis, we use a particular projection of the residual-based empirical process by viewing this quantity as an (approximate) semimartingale with respect to its natural filtration. The projection is given by the Doob-Meyer decomposition of this semimartingale (see page 1012 of Khmaladze and Koul, 2004). For this purpose we will assume that F * has finite Fisher information for location and scale, i.e.
writing f * for the derivative of the Lebesgue density f * . The Khmaladze transformation produces a standard limiting distribution: a standard Brownian motion on [0, 1], and as a consequence we can construct test statistics which are asymptotically distribution free, i.e. the corresponding critical values do not depend on F * specified by the null hypothesis.
To be precise, note that F * characteristically has mean zero and variance equal to one. In order to introduce our test statistic we define the augmented score function Following Khmaladze and Koul (2009) the transformed empirical process of standardized residuals is given bŷ for some t 0 < ∞. We can rewriteξ 0 in a more computationally friendly form, i.e.
In general, the incomplete information matrix Γ does not have a simple form, and Γ(t 0 ) degenerates as t 0 → ∞. To avoid this degeneracy issue we proceed as in Stute et al. (1998), who recommend using the 99% quantile from the empirical distribution functionF for t 0 , i.e. t 0 =F −1 (0.99) writingF −1 for the sample quantile function associated withF. We propose to base a goodness-of-fit test for the hypothesis (3.1) on the supremum metric betweenξ 0 /(F(t 0 )) 1/2 and the constant 0: The test statistic T 0 has an asymptotic distribution given by sup 0≤s≤1 |B(s)| under the null hypothesis (3.1).
Our proposed goodness-of-fit test for the null hypothesis (3.1) is then defined by where q α is the upper α-quantile of the distribution of sup 0≤s≤1 |B(s)|. The value of q α may be obtained from formula (7) on page 34 of Shorack and Wellner (1986), i.e.

Finite sample properties
We conclude the article with a numerical study of the previous results with two examples and an application of the proposed test. Throughout this section we consider a goodness-of-fit test for normally distributed errors in the indirect regression model (1.1), i.e.
Note that in this case a straightforward calculation shows that the augmented score function h and the incomplete information matrix Γ from (3.3) become particularly simple, that is writing Φ and φ for the respective distribution and density functions of the standard normal distribution.
4.1. Simulation study. In the first example we generate independent bivariate covariates X j = (X 1,j , X 2,j ) T with independent and identically distributed components X 1,j and X 2,j (j = 1, . . . , n) as follows. The common distribution of X 1,j and X 2,j is characterized by the density function g(x 1 , , which is depicted in the left panel of Figure 1, where One can easily verify that g is a probability density function and satisfies the requirements of Assumption 3 for any s > 0. The random sample of covariates X 1 , . . . , X n is then generated from the distribution characterized by the non-trivial density function g using a standard probability integral transform approach. In the second example we use independently, uniformly distributed covariates in the unit square [0, 1] 2 . The distortion function ψ is taken as the product of two (normalized) Laplace density functions restricted to the interval [0, 1], each with mean 1/2 and scale 1/10. For greater transparency, the Fourier coefficients of the distortion function ψ are This choice indeed satisfies Assumption 1 with b = 2. When nonparametric smoothing is performed we work with the radially symmetric spectral cutting kernel characterized by the Fourier coefficient function Λ(c n k) = 1[ c n k ≤ 1], k ∈ Z 2 , with smoothing parameter c n chosen by minimizing the leave-one-out cross-validated estimate of the mean squared prediction error (see, for example, Härdle and Marron, 1985). This choice is practical, simple to implement and performed well in our study.
The indirect regression function is given by This is easily seen to belong to M(s 0 ) for any s 0 > 0. Following the previous discussion, the distorted regression Kθ belongs to M(s 0 + 2) for any s 0 > 0. In the middle and right panels of Figure 1 we display the indirect regression function θ and the distorted regression function Kθ. We considered four scenarios: normally distributed errors with standard deviation σ = 1/2; Laplace distributed errors with scale parameter σ = 1/2; centered, skew-normal errors with scale parameter σ = 1 and skew parameter α = 3 (standard deviation is 0.2265); Student's  Table 1. Simulated power of the goodness-of-fit test (3.5) for normally distributed errors at the 5% level with sample sizes 100, 200, 300 and 500 and with covariates having non-trivial distribution characterized by the density function g. The first row corresponds to N(0, (1/2) 2 ) distributed errors. The remaining rows display the powers of the test under the fixed alternative error distributions: Laplace with scale parameter σ = 1/2; centered, skew-normal with scale parameter σ = 1 and skew parameter α = 3; Student's t with ν = 6 degrees of freedom.  Table 2. Simulated power of the goodness-of-fit test (3.5) for normally distributed errors at the 5% level with sample sizes 100, 200, 300 and 500 and with covariates independently, uniformly distributed in [0, 1] 2 . The first row corresponds to N(0, (1/2) 2 ) distributed errors. The remaining rows display the powers of the test under the fixed alternative error distributions: Laplace with scale parameter σ = 1/2; centered, skew-normal with scale parameter σ = 1 and skew parameter α = 3; Student's t with ν = 6 degrees of freedom.
t distributed errors with ν = 6 degrees of freedom (standard deviation is 1.2247). The first scenario allows us to check the level of the proposed test statistic T 0 , and the other three scenarios allow for observing the simulated powers of the proposed test. Here we work with a 5%-level test, and the quantile q 0.05 is then 2.2414.
We perform 1000 simulation runs of samples of sizes 100, 200, 300 and 500. Table 1 displays the results for the first example (when the covariates have the non-trivial distribution characterized by the density function g) and Table 2 displays the results for the second example (when the covariates are independently, uniformly distributed in the unit square [0, 1] 2 ). Beginning with the first example, at the sample size 100 the test rejected the null hypothesis in 4.8% of the cases (near the desired 5%) but at the sample sizes 200 and 300 the test respectively rejected the null hypothesis in 9.8% and in 7.2% of the cases, which are both above the desired 5% nominal level. However, at the sample size 500 the test rejected the null hypothesis in 5.2% of the cases, which is (again) near the desired nominal level of 5%. We expect that this behavior is due to the data-driven smoothing parameter selection. Interestingly, in the second example the test is slightly conservative at all of the simulated sample sizes (e.g. rejecting 3.2% of the cases at sample size 300), but with sample size 500 the test rejected the null hypothesis in 4.8% of the cases (near the nominal level of 5%), which coincides with the first example.
Turning our attention now to the power of the test, in the first example, we can see that the test performs well for moderate and larger sample sizes. At the sample size 100 the test respectively rejected the alternative error distributions Laplace, skew-normal and Student's t in only 20.9%, 13.6% and 21.1% of the cases, but at the sample size 500 the test respectively rejected the alternative distributions in 91.4%, 82.8% and 78.6% of the cases. In the second example, we can see that the power of test dramatically improves with smaller sample sizes (rejecting the alternative distributions in 31.8%, 22.6% and 27% of the cases at sample size 100) with less improvement at larger sample sizes (rejecting the alternative distributions in 97.9%, 94.3% and 81.5% of the cases at the sample size 500). In conclusion it appears that the proposed test statistic T 0 is an effective tool for testing the goodness-of-fit of a desired error distribution in indirect regression models. We therefore apply the Anscombe transformation Y → 2(Y + 3/8) 1/2 to obtain approximately normally distributed data, and then apply the test (3.5) to check the assumption of normally distributed errors (at the 5% level) from a reconstruction of this image using the previously studied results. We use the computing language R with the package OpenImageR, which allows for reading the image data and conducting our analysis.
Since the total number of observations is quite large, we rather illustrate the test for normal errors using two smaller sections of the original HeLa image. To display the reconstructions of the smaller images (for visual comparison with the original data) we apply the inverse of the Anscombe transformation to the fitted values of each regression. In both examples, the pixels are mapped to midpoints of appropriate grids of the unit square [0, 1] 2 . The first image we consider is 32 × 32 pixels composing 1024 observations and is displayed in Figure 3 alongside its reconstructed version and a normal QQ-plot of the resulting standardized regression residuals (see Section 3). The second image we consider is 64 × 64 pixels composing 4096 observations and is displayed in Figure 4 alongside its reconstructed version and a normal QQ-plot of the

Normal Q−Q Plot
Theoretical Quantiles Sample Quantiles resulting standardized regression residuals. In both cases, as in Section 4.1, when nonparametric smoothing is applied the smoothing parameter is chosen by minimizing the leave-one-out crossvalidated estimate of the mean squared prediction error. Beginning with the first and smaller image, the martingale transform test statistic T 0 that assesses the goodness-of-fit of a normal distribution has value 1.5141, which is smaller than 2.2414, and the null hypothesis of normally distributed errors is not rejected. Inspecting the QQplot of these standardized residuals it appears that the assumption of normally distributed errors is appropriate, which confirms our previous finding. In this case, we can see the reconstruction very closely mirrors the original.
Turning now to the second and larger image, the value of the test statistic is 39.8324, which is much larger than 2.2414, and we reject the null hypothesis of normally distributed errors. The QQ-plot of the standardized residuals now appears to contain systematic deviation from normality, which confirms that the hypothesis of the normally distributed errors is inappropriate. Here we can see the reconstruction is now not as accurate as it was for the previous case. In conclusion, we can see the approach of using the proposed test statistic T 0 for assessing convenient forms of the error distribution is useful. medical data for improved diagnosis, supervision and drug development". The authors would also like to thank Kathrin Bissantz for providing us the HeLa cells image used in our data example.

Appendix
In this section we give the technical details supporting our results. We have the following uniform convergence property for the density estimatorĝ. Lemma 1. Let the Fourier smoothing kernel Λ be as in Assumption 2, and let Assumption 3 hold with s > 0. Then, for any smoothing parameter sequence {c n } n≥1 satisfying (nc m n ) −1 log(n) → 0 as c n → 0 with n → ∞, (and note that |Λ(c n k) − 1| = 0 whenever k ≤ c −1 n ) to see that Using the representation L Λ (x) = k∈Z m Λ(k)e i2πk·x and the fact that {Λ(c n k)} k∈Z m are the Fourier coefficients of the function L Λ (·/c n )/c m n we obtain One calculates directly that In addition, L Λ is bounded and therefore   We will now demonstrate that max k∈Z m |φ g (k) − φ g (k)| = O(n −1/2 log 1/2 (n)), almost surely. Let k ∈ Z m be arbitrary and writê where X is a generic random variable with distribution characterized by the density function g.
The complex exponential functions are bounded in absolute value by 1, and it is easy to verify that Var[exp(i2πk · X)] ≤ 1. As above, use Bernstein's inequality choosing a large enough positive constant C (through the choice of the quantity O(n −1/2 log 1/2 (n))) to find that is summable in n. This occurs when C > 1, independent of k. It follows that max k∈Z m |φ g (k) − φ g (k)| = O(n −1/2 log 1/2 (n)), almost surely. Further, let C i be arbitrary. For any x ∈ C i it follows that Now use Euler's formula to write and (using that sine and cosine are Lipschitz functions with constant equal to one) derive the bound Combining (5.6) with (5.5) there is a positive constant C > 0 such that  Proof. Let k ∈ Z m be arbitrary and writê Since θ ∈ M(s 0 ) for some s 0 > 0, it follows that Kθ is bounded, and a standard argument shows that max k∈Z m |T 1 (k)| is of the order O(n −1/2 log 1/2 (n)) = o(c s n + (nc m n ) −1/2 log 1/2 (n)), almost surely. Analogously, max k∈Z m |T 2 (k)| is of the order o(c s n + (nc m n ) −1/2 log 1/2 (n)), almost surely. From the result of Lemma 1 we can see that max k∈Z m |T 3 (k)| is of the order O(c s n + (nc m n ) −1/2 log 1/2 (n)), almost surely. Finally, with some technical effort one shows that max k∈Z m |T 4 (k)| is of the order o(c s n + (nc m n ) −1/2 log 1/2 (n)), almost surely.
We are now ready to state the proof of Theorem 1.
Proof of Theorem 1. Write The proof of Theorem 2 follows from the above results with an additional property of the distorted regression estimator Kθ and an approximation result for the differenceσ 2 − σ 2 . Proposition 1. Choose the Fourier smoothing kernel Λ to be radially symmetric. Then the estimator Kθ enjoys the property that 1 n n j=1 Kθ(X j ) − Kθ(X j ) − 1 n n j=1 ε j = 0.
If the assumptions of Theorem 1 are satisfied with s 0 + b > 3m/2, then the estimatorσ enjoys the property that Kθ(X j ) − Kθ(X j ) − 1 n n j=1 Since Λ is radially symmetric, we have that W cn (X j −X k ) = W cn (X k −X j ) for every 1 ≤ j, k ≤ n.
One combines this fact with the additional fact that |Y j | is finite with probability 1 for each 1 ≤ j ≤ n to finish the proof of the first assertion.
To show the second assertion we need to use the results of Theorem 1 as follows. Writê Analogously to the proof of Lemma 2, one treats max k∈Z m |n −1 n j=1 ε j exp(i2πk · X j )| using a standard argument and finds this quantity is of the order O(n −1/2 log 1/2 (n)), almost surely. For the quantities inside the large brackets, one uses Lemma 2 and handles the series term as in the proof of Lemma 1 to show that the first term is of the order O(c s−m n ) = O(c s 0 +b n ) (since s = s 0 + b + m) and the second term is easily shown to be of the order O(c s 0 +b n ) (see the proof of Lemma 1). Therefore, |R 2,n | is of the order O(c s 0 +b n n −1/2 log 1/2 (n)) = o(n −1/2 ), almost surely.
Neumeyer and Van Keilegom (2010) consider estimation of the distribution function of the standardized errors using a residual-based empirical distribution function based on nonparametric regression residuals obtained by local polynomial smoothing. These authors obtain asymptotic negligibility of a modulus of continuity relating their residual-based empirical distribution function to the empirical distribution function of their regression model errors (see Lemma A.3 in that article). We obtain a similar result for the estimatorF (stated as a proposition below) using analogous arguments to those of Neumeyer and Van Keilegom (2010). These arguments have been omitted for brevity. Proposition 2. Let the assumptions of Theorem 1 be satisfied with s 0 + b > m. Additionally, assume that F * admits a bounded Lebesgue density f * that satisfies sup t∈R |tf * (t)| < ∞. Then under the null hypothesis H 0 in (3.1) We are now prepared the state the proof of Theorem 2.