Consistency of logistic classifier in abstract Hilbert spaces

We study the asymptotic behavior of the logistic classifier in an abstract Hilbert space and require realistic conditions on the distribution of data for its consistency. The number kn of estimated parameters via maximum quasi-likelihood is allowed to diverge so that kn/n → 0 and nτ kn → ∞, where n is the number of observations and τkn is the variance of the last principal component of data used for estimation. This is the only result on the consistency of the logistic classifier we know so far when the data are assumed to come from a Hilbert space.


Introduction
Functional Data Analysis (FDA) is an active research area in statistics that includes a collection of theorems and methods for dealing with infinite-dimensional (functional) data (see Ramsay & Silverman (2002) and Ramsay & Silverman (2005) for an overview).Classification of functional data is one of the hottest topics in FDA and establishing consistency of various classifiers for functional data has been of a great research interest for more than a decade.
Most of classifiers assign an observation to the class with the largest estimated posterior probability.Consistency of such a classifier is then implied by the consistency of the estimate of that probability.If it depends on a finite number of unknown parameters, as in the logistic model in R k , then it suffices to consistently estimate all the parameters.For example, in the R k case the logistic classifier has been proved to be consistent, strongly consistent (see, e.g.Chen et. al. (1999)) and even uniformly consistent Kazakeviciute & Olivo (2016).
The situation becomes more complicated if conditional probability is modelled by the infinite number of parameters, as in the logistic model in an infinitedimensional Hilbert space E. In this case we are given independent observations 4488 A. Kazakeviciute and M. Olivo (X 1 , Y 1 ), . . ., (X n , Y n ) of (X, Y ), where X is E-valued random variable and Y ∈ {−1, 1} is its associated class label.Usually, then the following 3-step procedure is used: (1) some orthonormal basis in E is chosen and the observations are replaced by their coefficients in that basis (a finite number, say, l of coefficients are retained), (2) the principal component analysis of the obtained n × l array of data is performed and the first k principal components are retained, (3) the usual logistic regression on the new n × (k + 1) array of data is performed.From the mathematical point of view this means that we replace the original observations by their orthogonal projections in some k-dimensional subspace E k ⊂ E and find the estimate θkn of the unknown parameter θ 0 ∈ E, which maximizes the quasi-likelihood over all θ ∈ E k .Of course, if we want to analyze asymptotic properties of such an estimator (and of the corresponding classifier, based on that estimator), we should also assume that k depends on n, that is, the final estimator to be analyzed is θknn for some sequence k n → ∞.
Note that if E k is obtained by the procedure described above, then it is a random subspace of E (it depends on data).This makes the analysis of θknn rather complicated.Therefore here we will analyze the simpler case where E k are non random.Formally, this means that we omit the step of principal component analysis.This approach (call it naïve) is also known in the literature, but in some cases is not recommended for practical use.For example, Escabias et. al. (2007) argued that the naïve approach in the context of functional data introduces multicollinearity (strong dependence among predictors) which in turn causes inaccurate parameter estimates and increases their variance.However, the asymptotic results in the case where E k are non-random in some situations are good, as we show later.Moreover, they show what can be expected in the general case because some required assumptions are likely to remain also in the general setting.
In this work we establish the consistency of the logistic classifier under the two sets of conditions.The first set consists of three conditions on the distribution of X that are rather simple and, nevertheless, sufficiently general.All three conditions are satisfied if X has a normal distribution in Hilbert space with zero mean and positive definite covariance form.The second set of conditions bound the growth rate of k n : we require that k n /n → 0 and nτ 4 kn → ∞, where τ k = min θ∈E k , θ =1 C(θ, θ) and C is the moment form of X defined by (3).As we later discuss, τ k can be interpreted as the variance of the kth theoretical principal component.The first condition requires k to be asymptotically less than n which is almost necessary.The second condition suggests that the variance of the last theoretical principal component tends to 0 slower than n −1/4 , as n → ∞.However, this condition can be relaxed, as our simulation study shows.
In the literature, there are limited attempts to study asymptotic behavior of logistic estimate when dimensionality k n of data used for estimation diverges together with the sample size.For example, van de Geer (2008), Fan & Song (2010) and Wang (2011) studied related but slightly different problems, that is, models that include some kind of penalty on a parameter vector, such as Lasso.At first look it could seem that a very close attempt to solve the described problem was the one of Liang & Du (2012), where they proved the asymptotic normality of the parameter estimate under mild conditions.However, the fundamental difference between their work and ours is that they did not consider covariates X to be random, while we do.In principle, the results for the model with non-random data can be applied also to the case where the data are random, provided that the assumptions used for non-random data are satisfied for each realization of random data.However, we cannot apply their result to solve our problem because one of their assumptions translates as inf k τ k > 0 which does not hold, if data come from a Hilbert space and follow normal distribution in Hilbert space.
In such a situation we can always select basis system {e j } such that the coordinates of X are uncorrelated.Then The results nearest to ours are achieved in Müller & Stadtmüller (2005).In the paper, the authors studied generalized linear models with no penalty and established asymptotic normality for a properly scaled distance between the estimated and the true parameters.However, they assume (see assumption (M1)) that if Var X Y = σ 2 (E X Y ) (where E X , Var X denote the conditional mean and conditional variance, given X, respectively) then the function σ is bounded away from 0: σ 2 (μ) ≥ δ > 0 for all μ.This is not the case for logistic regression model, where σ 2 (μ) = μ(1 − μ).This means that the results in Müller & Stadtmüller (2005) cannot be applied to prove the consistency of logistic classifier as considered in this work.Moreover, Müller & Stadtmüller (2005) approximated infinite-dimensional model by a finite-dimensional one, that is, they assumed that the distribution of Y depends on the projection of θ 0 onto some subspace E k rather than on full θ 0 ∈ E, and assumed that the error of such an approximation tends to 0. However, we could not find any proof of the latter rather complicated statement.No such approximation is involved in our work.
Our paper is organized as follows.In Section 2 we describe the statistical problem considered, explicitly state the assumptions, give some discussion on them, and state our main result.In Section 3 we provide a simulation study to check the necessity of the assumptions and we end this work with a brief discussion in Section 4. All proofs are left for Section 5.

Consistency
Let E be a separable Hilbert space with the inner product •, • .Let X ∈ E be a Hilbert space-valued random variable and Y a random variable, gaining values −1 and 1, with conditional probabilities (w.r.t.X) being 1 − p θ0 (X) and p θ0 (X), respectively.Here θ 0 ∈ E is an unknown parameter and For example, if E = 2 , the space of all square-summable sequences, then any Hilbert space, we will work with the general notation θ, x instead.
Naturally, for various practical tasks it is of great interest to provide an estimate of p θ .
Let (E k ) be some fixed sequence of the linear subspaces of the space E such that the following conditions are satisfied: (1) dim E k = k for all k, (2) E k ⊂ E k+1 for all k, and ( 3) (1) Note that taking θ ∈ E k in the above expression introduces some approximation error.To force this error to tend to 0 as n diverges, fix some sequence (k n ) and set θ = θknn and p = p θ . (2) We will call p the logistic estimate of the true conditional probability p θ0 .For example, let E = L 2 (T ) with the usual inner product where T ⊂ R is an interval and L 2 is the space of square integrable real functions on T endowed with the usual inner product x 1 , x 2 = 1 0 x 1 (t)x 2 (t)dt.The standard method for obtaining logistic estimate from a given sample (X 1 , Y 1 ), . . ., (X n , Y n ) is expanding X and θ via selected basis functions {e j } θ j e j (t), choosing k = k n and then using (1), where We consider the following statistical task.We want to estimate the unknown true conditional probability p θ0 , given the sample (X 1 , Y 1 ), . . ., (X n , Y n ) from the distribution of (X, Y ).The quality of the estimate p is assessed by the risk E|p(X) − p θ0 (X)|.If the risk tends to 0, as n → ∞, the estimate p is called consistent.It is well known that if p is consistent, then the empirical classifier, which assigns x to the class 1 whenever p(x) > 1/2, is also consistent (see, e.g., van Ryzin (1966) or Kazakeviciute et. al. (2017)).Here we will consider the logistic estimate (2), where we suppose that θkn = 0, if the minimum is not attained or is not unique.
We will say that the distribution of X is of full rank, if P( θ, X = 0) = 0, for all θ = 0. Also, recall that any family of random variables (Z s ) is called The consistency of the logistic estimate will be proved under the following assumptions on the distribution of X: Assumption (M) implies that the mean of X and the second moment form of X are correctly defined.The mean is the only such vector EX from E that θ, EX = E θ, X for all θ ∈ E. The second moment form is defined by (3) where (c ij ) is a covariance matrix of the random vector X.Since E can be any abstract Hilbert space, we will work with the general notation C(θ 1 , θ 2 ).
The second moment form is a continuous bilinear form on E.Moreover, it is symmetric and positive semi-definite, that is, for all θ, Obviously, C(θ, θ) = 0 if and only if P( θ, X = 0) = 1.This implies that C(θ, θ) > 0 if and only if P( θ, X = 0) < 1. Recall that assumption (FR) is P( θ, X = 0) = 0. Hence assumption (FR) is slightly stronger than the requirement that C is positive definite.
The required conditions are realistic and hold for a variety of real-life settings.For example, all three assumptions hold, if X is a normally distributed random vector with zero mean and positive definite covariance form.Indeed, then E X s < ∞, for all s, and Here Z is a random variable that follows a standard normal distribution.Denote Here C is the moment form of X, defined by (3).For example, if E = 2 , E k satisfy the conditions mentioned above, EX = 0, the coordinates of X are uncorrelated and the variances of them decrease, then τ k is the variance of the kth coordinate.In other words, τ k is the variance of the kth theoretical principal component.
Our main result is the following Theorem.
Note that the condition nτ 4 kn → ∞ requires that the data are such that the variance of the last principal component tends to 0 slower than n −1/4 , as n → ∞.This in turn suggests that the data need to be such that it cannot be sufficiently explained only by a few principal components.
In statistics, the logistic model with an intercept is usually preferred over the one without it because useful model information might be incorporated in the intercept term.Theorem 1 implies the analogous result on the logistic estimate, when the model with an intercept is considered, that is, when the conditional probability that Y = 1, given X = x, is defined by (5) In this case, the assumption (FR) should be changed to (FR') P( θ, X = α) = 0 for all θ = 0 and α ∈ R.
We call p α, θ the logistic estimate of (5), if where We say that the logistic estimate is consistent, if E|p α, θ (X) − p 0 (X)| → 0, as n → ∞, where p 0 (x) = p α0,θ0 (x) in this case.As before, τ k is defined by ( 4), where C is the covariance form of X.Our last result is the following Theorem.

Simulation study
To investigate the need of the conditions required for consistency, we performed a simulation study.We will give two examples: one, where all assumptions hold, and another one, where the assumption nτ 4 k → ∞ does not hold.
C ij e j (t) for any selected basis system, it is enough to generate coefficients C ij .To go in line with the (UI) assumption, we will generate C ij as independent and normally distributed variables with zero mean and variances σ 2 j = 1/(1.1 j ).Then τ k = σ 2 k .If we want that nτ 4 k = n1.1 −4k tend to ∞, we have to take k = c log n with c < 1/(4 log 1.1) ≈ 2.62.In this example, we took c = 2, so that nτ 4 k → ∞ and all assumptions would hold.We took θ 0 with θ 0i = 1/(1.1 i ) and calculated p θ0 (X i ) up to the precision = 10 −4 .To this end we generated additional coordinates X ij for j ≤ l, where l was the first index with |θ 0l X il | < .
We generated 300, 500, 1000, 1500 and 2000 observations, respectively, over 100 independent runs for each setting, and each time we approximated the distance with U = (U 1, U2) distributed according to the normal law with zero mean and covariance matrix We calculated f using the Monte Carlo method.We simulated 10000 independent copies of U , which gives, as preliminary testing shows, approximately 0.01 precision for d.We also reported the misclassification rate where we set ŷi = 1, if p(x i ) >= 1/2.Moreover, we reported the Bayes risk, where the probability of misclassification was calculated by where U ∼ N (0, 1/(1.1 3 − 1)).Again, we used Monte Carlo method to calculate (7). Figure 1 illustrates the simulated coefficients as well as the difference between the true and the estimated conditional probabilities.The x axis in plots difference p 0 − p between the true and the estimated conditional probabilities p 0 and p, respectively, as functions of x.The x axis represents the observation number i and the y-axis shows the value of p 0 − p at the x = x i , i = 1, . . ., n.
We can see that the differences between the true and the estimated conditional probabilities are distributed more or less normally around zero and that the variance of them decreases as n increases suggesting that the average difference between the two probabilities tends to zero.This is further confirmed by d(p, p 0 ) values in Table 1 which contains numerical results, averaged over 100 independent runs.As we can see from Table 1, the assumption nτ 4 k → ∞ holds and d(p, p 0 ) → 0, as expected.Example 2. We considered the same settings as for Example 1, except that now we took c = 6, so that nτ 4 k → 0 and even nτ 2 k → 0. Figure 2 illustrates the simulated data as well as the difference between the true and the estimated conditional probabilities, while numerical results, averaged over 100 independent runs, are displayed in Table 2.As we can see from Table 2, the assumption nτ 4 k → ∞ (and even weaker assumption nτ 2 k → ∞) is violated but d(p, p 0 ) → 0, regardless.This suggests that the assumption nτ 4 k → ∞ might be not needed to establish the consistency of logistic estimate and could be relaxed in future investigations.

Discussion
As we noted in the previous Section, the assumption nτ 4 kn → ∞ does not seem to be necessary for our main result to hold.It is interesting that the analogous assumption (M3) in Müller & Stadtmüller (2005)   lower bound for τ kn is, or how Theorem 1 can be proved under an assumption weaker than nτ 2 kn → ∞, is not clear.

Facts from probability theory
Further in this Section, → p and → d denote convergence in probability and convergence in distribution, respectively, while → is used for the usual convergence in R, or convergence in norm in E. For convenience of reference we recall some well-known facts about convergence and uniform integrability of random variables.
Proposition 1 (Continuous mapping theorem, see Kallenberg (2001), Theorem 3.7).Let U n and U be random elements of some metric space S, P(U ∈ C) = 1, T another metric space, and f n , f measurable functions from S to T .If Proposition 2 (Subsequence criterion, see Kallenberg (2001), Lemma 3.2).Let U n and U be random elements of some metric space S. Then U n → p U if and only if each subsequence of (U n ) has a further subsequence which converges in probability to U .

The function M (θ)
We begin by establishing some properties of the function M (θ).Recall that θ 0 denotes the "true" value of parameter θ.
Proof. 1. Inequality M (θ) > 0 is implied by the fact that m θ (x, y) > 0 for all x and y.Because log function is increasing, Finally, convexity of the function − log yields 2. The statement follows from the dominated convergence theorem, because and . By Proposition 2, we have to prove that any subsequence ( θ n k , X ) contains a further subsequence that tends in probability to θ 0 , X .Note that M (θ n k ) → M (θ 0 ), therefore, for ease of notation, we omit the index k.
The sequence of random vectors ( θ n , X , θ 0 , X ) is tight in the space R × R. Indeed, if K ⊂ R is a compact interval such that P( θ 0 , X ∈ K) ≥ 1 − (and we can always find such K), then the set R × K is also compact and for all n By the Prokhorov's theorem (see Kallenberg (2001), Theorem 14.3), there exists a subsequence ( θ n k , X , θ 0 , X ), which converges in distribution in the space R × R to some random vector (U 1 , U 2 ).
By Proposition 5, Obviously, U 2 is distributed identically to θ 0 , X .Hence and therefore Let V be a random variable gaining values −1 and 1 with (conditional w.r.t. (U 1 , U 2 )) probabilities 1 1 + e U2 and 1 1 + e −U2 .Then the above inequality can be re-written as This yields Therefore, both inequality signs can be replaced by equalities.However, Jensen's inequality becomes equality if and only if the variable that is being integrated almost surely is a constant.In this case that constant is 0, that is, almost surely Hence ( θ n k , X , θ 0 , X ) → d (U 2 , U 2 ) and therefore θ n k , X − θ 0 , X → d U 2 − U 2 = 0.When the limit random variable is 0 (or a constant), convergence in distribution is equivalent to convergence in probability (Kallenberg (2001), Lemma 3.7).Therefore, θ n k , X − θ 0 , X → p 0 and θ n k , X → p θ 0 , X .
For any f ∈ C r (E k ) we assume that its rth derivative at the point θ ∈ E k is a symmetric r-linear form on E k defined by where D dθ stands for the directional derivative along dθ ∈ E k .Its norm is defined by The function dθ → f (r) (θ)(dθ, . . ., dθ) is called the rth differential of f and is denoted by d r f (θ).For example, d 2 f (θ) is a quadratic form associated with the bilinear form f (θ).

Proposition 7. If assumptions (FR) and (M) hold, then, for any
Step 1: we will prove that sets Suppose the contrary.Then there exists some set A q that is not bounded.Find a sequence (θ m ) ⊂ E k such that M (θ m ) ≤ q for all m, and θ m → ∞, θ m / θ m → a, as m → ∞.Because a = 1 and the distribution of X is of full rank, either a, X < 0 or a, X > 0 with a positive probability.Since 0 < p θ0 < 1, and so E lim m→∞ m θm (X, Y ) = ∞.On the other hand, by Fatou's lemma, A contradiction.
Step 2: the end of the proof.The existence of θ k follows from Proposition 2.1.1 of Bertsekas et. al. (2003).Since M (θ) is strictly convex, the minimum point is unique.
If θ We are now ready to establish the consistency criterion.The following Proposition provides the consistency conditions for the estimate of the type p = p θn , where θn is any estimate of θ.If θn is defined by ( 1)-( 2), we get the consistency criterion for the logistic estimate.
2. Suppose assumptions (FR) and (M) hold, and θ k is the minimum of the function M in the space E k .If k n → ∞ and M ( θn ) − M (θ kn ) → p 0, then the estimate p θn is consistent.
Let now M ( θn ) → p M (θ 0 ).We have to prove that E|p θn (X) − p θ0 (X)| → 0. It is enough to prove that any subsequence E|p θns (X) − p θ0 (X)| has a further subsequence that tends to 0. Moreover, it is well-known that any sequence that converges in probability has a subsequence that converges almost everywhere.Therefore, it is enough to prove that, if almost surely M ( θns ) → M (θ 0 ), then E|p θns (X) − p θ0 (X)| → 0. However, if almost surely M ( θns ) → M (θ 0 ), then from the first paragraph of this proof we get that almost surely where E * denotes the conditional mean w.r.t.sequence ((X i , Y i ) | i ≥ 1).It is enough to use the dominated convergence theorem.
2. The second statement follows from the first one and from Proposition 7.

The function M n (θ)
Now suppose that k and n are fixed and consider M n (θ), as a function on Therefore, the function θ → m θ (x, y) is convex in E k .Then also the function M n (θ) is convex.We first give conditions for its strict convexity. Note , where X (k) i denotes the projection of vector X i in the space E k .Now suppose n ≥ k and let W kn denote the following event: vectors are linearly independent and the sample is not k-separable.If ω ∈ W kn then, by Propositions 9 and 10, the function M n (θ) is strictly convex and all its sub-level sets A q are bounded.As is seen from the proof of Proposition 7, then M n (θ) has the unique minimum point, which is, of course θkn (ω).If ω ∈ W kn , we suppose that θkn (ω) = 0.

Proof of Theorem 1
We follow the proof of Theorem 5.42 from van der Vaart (2000).
For k ≥ 1 and where x (k) denotes the orthogonal projection of x in the space E k .It is obvious that the function θ → ψ k,θ (x, y) is the gradient of the restriction of the function These functions are the gradients of the functions M n (θ) and M (θ), as functions on E k , respectively.Therefore, both Ψ k,n and Ψ k are C 2 -smooth functions from This yields dθ = 0, that is, θ 1 = θ 2 .Therefore, the function Ψ k is injective.
The statement of the theorem now follows from the inverse function theorem.
Proposition 11 implies that the set The following reasoning is under the assumption that event 3. Conceptual illustration of the ideas from Theorem 5.42 in van der Vaart (2000) that solves the well-known problem in statistics: by Law of Large Numbers, the empirical expectation tends to the true expectation.How to prove that θkn that minimizes the empirical expectation tends to θ k that minimizes the true expectation?As van der Vaart suggests, if the distance between the gradients of the empirical and the true expectations are bounded by δ k , then the distance between θkn and θ k is bounded by d k .
Therefore, in order to prove Theorem 1 it is enough to choose δ k in such a way that d kn → 0 and P(W c kn,n ) → 0. We now need to evaluate the diameter d k .The following Proposition gives the necessary result.

Proposition 12. Suppose assumptions (FR), (M) and (UI) are satisfied and
The proof of Proposition 12 is preceded with three lemmas.
Lemma 1.Let (Z n ) be a sequence of positive integrable variables such that the sequence (Z n /EZ n ) is uniformly integrable.Then, for all q < 1, Proof.Suppose the contrary.Without loss of generality, we can assume that From uniform integrability we get that Therefore, there exists n such that But then A contradiction.
Step 2: the end of the proof.
Proof.Fix and find c 1 such that sup n EZ n 1 {Zn>c1} < .
Then find c such that sup Then for all n, Therefore, Ũn = O p (1).Now we are ready to prove Proposition 12.
Proof.Lemma 2 implies that if k is large enough then, for any dθ ∈ E k with dθ = 1, at least one of the values of the function f (t) = Ψ k (θ k + tdθ), dθ is greater than δ k .The function is continuous, strictly increasing and equal to 0, when t = 0. Therefore, there exists unique t = t k (dθ) > 0 such that Ψ k (θ k + tdθ), dθ = δ k .
Step 1: we will prove that d k ≤ 2α k , where Step 2: transforming the task to a simpler one.
From the result in Step 1 we get that it is enough to prove that 1).Suppose the contrary, that there exists some subsequence that is unbounded.Then, without loss of generality, we can assume and we need to get a contradiction.
Let dθ k be unit-length vectors from E k such that t k (dθ k )/α k → 1.Then For short, denote and It is obvious that Therefore, and we have to obtain a contradiction.
Step 4: the case, where The sequence on the left is not greater than 1 for all t.Therefore, by the dominated convergence theorem g u k (y k , z 1k , z 2k ) → g u (y, z 1 , z 2 ).Then, by Proposition 1, The sequence of random variables on the left hand side is not greater than 1.Therefore, by the Proposition 4 We got a contradiction because g u function is everywhere positive.
Step 5: the case, where u k → ∞.From we get that the sequence of random variables (1/| Z2k |) is uniformly integrable.
In other words, where The sequence of random variables on the left hand side is dominated by the sequence (1/| Z2k |) which is uniformly integrable.Therefore by Proposition 4 Again, we got a contradiction because almost surely It remains to estimate the probability P(W c kn ).In order to do this, we have to estimate sup Fix θ ∈ Ūk and denote dθ = θ − θ k .By using Taylor's expansion we get The first term on the right hand of ( 12) is estimated as follows.Let (e 1 , . . ., e k ) be an orthonormal basis of E k .Then Therefore, the probability that we are interested does not exceed 9E X 2 nδ 2 kn .
Similarly, we can evaluate the second term of (12).Again, we would like to apply Chebyshev's inequality and get that However, since Ψ k,n is a vector-valued function, its derivative is a linear operator which makes the exact computation of its norm very complex.To make things simpler, here we can use the Hilbert-Schmidt norm instead, which is known to be greater than usual norm.Therefore, The third term of ( 12) tends to 0, if d 2 kn /δ kn → 0. Therefore, Theorem 1 will be proved, if we can select δ k such that  1).Therefore, it is enough to select δ k = o(τ 2 k ) such that nδ 2 kn → ∞, that is, in such a way that asymptotically n −1/2 ≺ δ kn ≺ τ 2 kn , where a ≺ b means that a = o(b).Clearly, we can achieve this, if that is, if nτ 4 kn → ∞ which is exactly the assumption of Theorem 1.
Therefore, τk ≥ min For any k and n define θkn = arg min θ∈E k M n (θ).
Fig 1. Illustration of simulated data for Example 1. (a)-(c) Simulated coefficients C ij for n = 300, 1000 and 2000, respectively.(d)-(f ) Difference (p 0 − p) between the true conditional probability p 0 and the estimated conditional probability p, evaluated for the generated observations.

Fig 2 .
Fig 2. Illustration of simulated data for Example 2. (a)-(c) Simulated coefficients C ij for n = 300, 1000 and 2000, respectively.(d)-(f ) Difference (p 0 − p) between the true conditional probability p 0 and the estimated conditional probability p, evaluated for the generated observations.
e j − Ψ k (θ k )e j , e j ) θ k (X, Y )(e j , e j )) third condition is implied by the first and the second ones.If we take δ k = o(τ 2 k ), then the first and the fourth conditions are met because thend k = O(δ k /τ k ) = o(1) and d 2 k /δ k = O(δ k /τ 2 k ) = o(

Table 1
Numerical results for Example 1, averaged over 100 independent runs

Table 2
Numerical results for Example 2, averaged over 100 independent runs